Behavioral recognition system

US 8,131,012 B2
Filed: 02/08/2008
Issued: 03/06/2012
Est. Priority Date: 02/08/2007
Status: Active Grant

First Claim

Patent Images

1. A method for processing a stream of video frames recording events within a scene, the method comprising:

receiving a first frame of the stream, wherein the first frame includes data for a plurality of pixels included in the frame;

identifying one or more groups of pixels in the first frame, wherein each group depicts an object within the scene;

generating a search model storing one or more features associated with each identified object;

classifying each of the objects using a trained classifier;

tracking, in a second frame, each of the objects identified in the first frame using the search model;

supplying the first frame, the second frame, and the object classifications to a machine learning engine; and

generating, by the machine learning engine, one or more semantic representations of behavior engaged in by the objects in the scene over a plurality of frames, wherein the machine learning engine is configured to learn patterns of behavior observed in the scene over the plurality of frames and to identify occurrences of the patterns of behavior engaged in by the classified objects.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments of the present invention provide a method and a system for analyzing and learning behavior based on an acquired stream of video frames. Objects depicted in the stream are determined based on an analysis of the video frames. Each object may have a corresponding search model used to track an object'"'"'s motion frame-to-frame. Classes of the objects are determined and semantic representations of the objects are generated. The semantic representations are used to determine objects'"'"' behaviors and to learn about behaviors occurring in an environment depicted by the acquired video streams. This way, the system learns rapidly and in real-time normal and abnormal behaviors for any environment by analyzing movements or activities or absence of such in the environment and identifies and predicts abnormal and suspicious behavior based on what has been learned.

Citations

30 Claims

1. A method for processing a stream of video frames recording events within a scene, the method comprising:
- receiving a first frame of the stream, wherein the first frame includes data for a plurality of pixels included in the frame;
  
  identifying one or more groups of pixels in the first frame, wherein each group depicts an object within the scene;
  
  generating a search model storing one or more features associated with each identified object;
  
  classifying each of the objects using a trained classifier;
  
  tracking, in a second frame, each of the objects identified in the first frame using the search model;
  
  supplying the first frame, the second frame, and the object classifications to a machine learning engine; and
  
  generating, by the machine learning engine, one or more semantic representations of behavior engaged in by the objects in the scene over a plurality of frames, wherein the machine learning engine is configured to learn patterns of behavior observed in the scene over the plurality of frames and to identify occurrences of the patterns of behavior engaged in by the classified objects.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, further comprising issuing at least one alert indicating an occurrence of one of the identified patterns of behavior by one of the tracked objects.
  - 3. The method of claim 1, wherein each search model is generated as one of an appearance model and a feature-based model.
  - 4. The method of claim 1, wherein the step of tracking, in the second frame, each of the objects identified in the first frame using the search model comprises:
    - locating the identified objects within the second frame; and
      
      updating the respective search model for each identified object.
  - 5. The method of claim 1, wherein the trained classifier is configured to classify each object as one of a human, car, or other.
  - 6. The method of claim 5, wherein the trained classifier is further configured to estimate at least one of a pose, a location, and a motion for at least one of the classified objects, based on changes to the group of pixels depicting the object over a plurality of successive frames.
  - 7. The method of claim 1, wherein the step of identifying one or more groups of pixels in the first frame comprises:
    - identifying at least one group of pixels representing a foreground region of the first frame and at least one group of pixels representing a background region of the first frame;
      
      segmenting foreground regions into foreground blobs, wherein each foreground blob represents an object depicted in the first frame; and
      
      updating a background image of the scene based on the background regions identified in the first frame.
  - 8. The method of claim 7, further comprising updating an annotated map of the scene depicted by the video stream using the results of the steps of generating a search model storing one or more features associated with each identified object;
    - classifying each of the objects using a trained classifier; and
      
      tracking, in a second frame, each of the objects identified in the first frame using the search model.
  - 9. The method of claim 8, wherein the annotated map describes a three dimensional geometry of the scene including an estimated three-dimensional position of the identified objects and an estimated three-dimensional position of a plurality of objects depicted in the background image of the scene.
  - 10. The method of claim 8, wherein the step of building semantic representations further comprises analyzing the built semantic representations for recognizable behavior patterns using latent semantic analysis.

11. A non-transitory computer-readable storage medium containing a program, which, when executed on a processor is configured to perform an operation, comprising:
- receiving a first frame of the stream, wherein the first frame includes data for a plurality of pixels included in the frame;
  
  identifying one or more groups of pixels in the first frame, wherein each group depicts an object within the scene;
  
  generating a search model storing one or more features associated with each identified object;
  
  classifying each of the objects using a trained classifier;
  
  tracking, in a second frame, each of the objects identified in the first frame using the search model;
  
  supplying the first frame, the second frame, and the object classifications to a machine learning engine; and
  
  generating, by the machine learning engine, one or more semantic representations of behavior engaged in by the objects in the scene over a plurality of frames, wherein the machine learning engine is configured to learn patterns of behavior observed in the scene over the plurality of frames and to identify occurrences of the patterns of behavior engaged in by the classified objects.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The non-transitory computer-readable storage medium of claim 11, wherein the operation further comprises issuing at least one alert indicating an occurrence of one of the identified patterns of behavior by one of the tracked objects.
  - 13. The non-transitory computer-readable storage medium of claim 11, wherein each search model is generated as one of an appearance model and a feature-based model.
  - 14. The non-transitory computer-readable storage medium of claim 11, wherein the step of tracking, in the second frame, each of the objects identified in the first frame using the search model comprises:
    - locating the identified objects within the second frame; and
      
      updating the respective search model for each identified object.
  - 15. The non-transitory computer-readable storage medium of claim 11, wherein the trained classifier is configured to classify each object as one of a human, car, or other.
  - 16. The non-transitory computer-readable storage medium of claim 15, wherein the trained classifier is further configured to estimate at least one of a pose, a location, and a motion for at least one of the classified objects, based on changes to the group of pixels depicting the object over a plurality of successive frames.
  - 17. The non-transitory computer-readable storage medium of claim 11, wherein the step of identifying one or more groups of pixels in the first frame comprises:
    - identifying at least one group of pixels representing a foreground region of the first frame and at least one group of pixels representing a background region of the first frame;
      
      segmenting foreground regions into foreground blobs, wherein each foreground blob represents an object depicted in the first frame; and
      
      updating a background image of the scene based on the background regions identified in the first frame.
  - 18. The non-transitory computer-readable storage medium of claim 17, wherein the operation further comprises updating an annotated map of the scene depicted by the video stream using the results of the steps of generating a search model storing one or more features associated with each identified object;
    - classifying each of the objects using a trained classifier; and
      
      tracking, in a second frame, each of the objects identified in the first frame using the search model.
  - 19. The non-transitory computer-readable storage medium of claim 18, wherein the annotated map describes a three dimensional geometry of the scene including an estimated three-dimensional position of the identified objects and an estimated three-dimensional position of a plurality of objects depicted in the background image of the scene.
  - 20. The non-transitory computer-readable storage medium of claim 18, wherein the step of building semantic representations further comprises analyzing the built semantic representations for recognizable behavior patterns using latent semantic analysis.

21. A system, comprising:
- a video input source;
  
  a processor; and
  
  a memory storing;
  
  a computer vision engine, wherein the computer vision engine is configured to;
  
  receive, from the video input source, a first frame of the stream, wherein the first frame includes data for a plurality of pixels included in the frame,identify one or more groups of pixels in the first frame, wherein each group depicts an object within the scene,generate a search model storing one or more features associated with each identified object,classify each of the objects using a trained classifier,track, in a second frame, each of the objects identified in the first frame using the search model, andsupply the first frame, the second frame, and the object classifications to a machine learning engine; and
  
  the machine learning engine, wherein the machine learning engine is configured to generate one or more semantic representations of behavior engaged in by the objects in the scene over a plurality of frames and further configured to learn patterns of behavior observed in the scene over the plurality of frames and to identify occurrences of the patterns of behavior engaged in by the classified objects.
- View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
- - 22. The system of claim 21, wherein the machine learning engine is further configured to issue at least one alert indicating an occurrence of one of the identified patterns of behavior by one of the tracked objects.
  - 23. The system of claim 21, wherein each search model is generated as one of an appearance model and a feature-based model.
  - 24. The system of claim 21, wherein the step of tracking, in the second frame, each of the objects identified in the first frame using the search model comprises:
    - locating the identified objects within the second frame; and
      
      updating the respective search model for each identified object.
  - 25. The system of claim 21, wherein the trained classifier is configured to classify each object as one of a human, car, or other.
  - 26. The system of claim 25, wherein the trained classifier is further configured to estimate at least one of a pose, a location, and a motion for at least one of the classified objects, based on changes to the group of pixels depicting the object over a plurality of successive frames.
  - 27. The system of claim 21, wherein the computer vision engine is configured to identify the one or more groups of pixels in the first frame by performing the steps of:
    - identifying at least one group of pixels representing a foreground region of the first frame and at least one group of pixels representing a background region of the first frame;
      
      segmenting foreground regions into foreground blobs, wherein each foreground blob represents an object depicted in the first frame; and
      
      updating a background image of the scene based on the background regions identified in the first frame.
  - 28. The system of claim 27, wherein the computer vision engine is further configured to update an annotated map of the scene depicted by the video stream using the generated search model storing one or more features associated with each identified object.
  - 29. The system of claim 28, wherein the annotated map describes a three dimensional geometry of the scene including an estimated three-dimensional position of the identified objects and an estimated three-dimensional position of a plurality of objects depicted in the background image of the scene.
  - 30. The system of claim 28, wherein the step of building semantic representations further comprises analyzing the built semantic representations for recognizable behavior patterns using latent semantic analysis.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Motorola Solutions, Inc.
Original Assignee
Behavioral Recognition Systems Incorporated
Inventors
Eaton, John Eric, Cobb, Wesley Kenneth, Urech, Dennis Gene, Blythe, Bobby Ernest, Friedlander, David Samuel, Gottumukkal, Rajkiran Kumar, Risinger, Lon William, Saitwal, Kishor Adinath, Seow, Ming-Jung, Solum, David Marvin, Xu, Gang, Yang, Tao
Primary Examiner(s)
Johns, Andrew W

Application Number

US12/028,484
Publication Number

US 20080193010A1
Time in Patent Office

1,488 Days
Field of Search

382/100, 382/103, 382/107, 382/155, 382/173, 382/224, 348/143, 340/573.1
US Class Current

382/103
CPC Class Codes

G06V 20/52   Surveillance or monitoring ...

G08B 13/19608   Tracking movement of a targ...

G08B 13/19613   Recognition of a predetermi...

Behavioral recognition system

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Behavioral recognition system

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links