Background model for complex and dynamic scenes

US 9,959,630 B2
Filed: 02/09/2016
Issued: 05/01/2018
Est. Priority Date: 08/18/2009
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

receiving a sequence of video frames from a video camera;

receiving a request to view a scene depicted in the sequence of video frames;

identifying and tracking at least one object between separate frames of the sequence of video frames;

classifying each tracked object based on a known category of objects;

generating a stream of context events associated with each tracked object;

generating a sequence of primitive events based on the stream of context events;

storing the stream of context events and the sequence of primitive events in one or more adaptive resonance theory (ART) networks;

storing detailed data in the one or more ART networks related to an event based on the stream of context events and the sequence of primitive events;

storing generalized data in the one or more ART networks related to an event based on the stream of context events and the sequence of primitive events; and

evaluating the stream of context events, the sequence of primitive events, the detailed data, and the generalized data with the one or more ART networks to learn patterns of behavior that occur within the scene.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for viewing a scene depicted in a sequence of video frames and identifying and tracking objects between separate frames of the sequence. Each tracked object is classified based on known categories and a stream of context events associated with the object is generated. A sequence of primitive events based on the stream of context events is generated and stored together, along with detailed data and generalized data related to an event. All of the data is then evaluated to learn patterns of behavior that occur within the scene.

57 Citations

20 Claims

1. A computer-implemented method, comprising:
- receiving a sequence of video frames from a video camera;
  
  receiving a request to view a scene depicted in the sequence of video frames;
  
  identifying and tracking at least one object between separate frames of the sequence of video frames;
  
  classifying each tracked object based on a known category of objects;
  
  generating a stream of context events associated with each tracked object;
  
  generating a sequence of primitive events based on the stream of context events;
  
  storing the stream of context events and the sequence of primitive events in one or more adaptive resonance theory (ART) networks;
  
  storing detailed data in the one or more ART networks related to an event based on the stream of context events and the sequence of primitive events;
  
  storing generalized data in the one or more ART networks related to an event based on the stream of context events and the sequence of primitive events; and
  
  evaluating the stream of context events, the sequence of primitive events, the detailed data, and the generalized data with the one or more ART networks to learn patterns of behavior that occur within the scene.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The computer-implemented method of claim 1, wherein the stream of context events includes a collection of kinematic information related to the at least one object.
  - 3. The computer-implemented method of claim 2, wherein the kinematic information includes one or more of:
    - position, current trajectory, projected trajectory, direction, orientation, velocity, acceleration, size and color.
  - 4. The computer-implemented method of claim 1, wherein classifying further includes identifying features of the object.
  - 5. The computer-implemented method of claim 4, wherein the identifying features include one or more:
    - of height/width in pixels, average color values, shape and area.
  - 6. The computer-implemented method of claim 1, wherein the object is a person and wherein the identifying features include one or more of:
    - prediction of a gender, an estimation of a pose, and an indication of whether the person is carrying an object.

7. A system, comprising:
- a processor; and
  
  one or more adaptive resonance theory (ART) networks in communication with the processor when the system is in operation, the one or more ART networks having stored thereon instructions that upon execution by the processor at least cause the system to;
  
  receive a sequence of video frames from a video camera;
  
  receive a request to view a scene depicted in the sequence of video frames;
  
  identify and track at least one object between separate frames of the sequence of video frames;
  
  classify each tracked object based on a known category of objects;
  
  generate a stream of context events associated with each tracked object;
  
  generate a sequence of primitive events based on the stream of context events;
  
  store the stream of context events and the sequence of primitive events in the one or more ART networks;
  
  store detailed data in the one or more ART networks related to an event based on the stream of context events and the sequence of primitive events;
  
  store generalized data in the one or more ART networks related to an event based on the stream of context events and the sequence of primitive events; and
  
  evaluate the stream of context events, the sequence of primitive events, the detailed data, and the generalized data with the one or more ART networks to learn patterns of behavior that occur within the scene.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The system of claim 7, wherein the stream of context events includes a collection of kinematic information related to the at least one object.
  - 9. The system of claim 8, wherein the kinematic information includes one or more of:
    - position, current trajectory, projected trajectory, direction, orientation, velocity, acceleration, size and color.
  - 10. The system of claim 7, wherein classifying further includes identifying features of the object.
  - 11. The system of claim 10, wherein the identifying features include one or more of:
    - height/width in pixels, average color values, shape and area.
  - 12. The system of claim 7, wherein the object is a person and wherein the identifying features include one or more of:
    - prediction of a gender, an estimation of a pose, and an indication of whether the person is carrying an object.

13. A computer-implemented method, comprising:
- receiving a sequence of video frames from a video camera;
  
  receiving a request to view a scene depicted in the sequence of video frames;
  
  retrieving a background image and one or more foreground images associated with the scene;
  
  identifying and tracking at least some of the one or more foreground images between separate frames of the sequence of video frames;
  
  classifying each tracked foreground image based on a known category of objects;
  
  generating a stream of context events associated with each tracked foreground image;
  
  generating a sequence of primitive events based on the stream of context events;
  
  storing the stream of context events and the sequence of primitive events in an adaptive resonance theory (ART) network;
  
  storing detailed data in the adaptive resonance theory (ART) network related to an event based on the stream of context events and the sequence of primitive events;
  
  storing generalized data in the adaptive resonance theory (ART) network related to an event based on the stream of context events and the sequence of primitive events; and
  
  evaluating the stream of context events, the sequence of primitive events, the detailed data, and the generalized data with the one or more ART networks to learn patterns of behavior that occur within the scene.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The computer-implemented method of claim 13, wherein the stream of context events includes a collection of kinematic information related to the at least one object.
  - 15. The computer-implemented method of claim 14, wherein the kinematic information includes one or more of:
    - position, current trajectory, projected trajectory, direction, orientation, velocity, acceleration, size and color.
  - 16. The computer-implemented method of claim 13, wherein classifying further includes identifying features of the object.
  - 17. The computer-implemented method of claim 16, wherein the identifying features include one or more of:
    - height/width in pixels, average color values, shape and area.
  - 18. The computer-implemented method of claim 13, wherein the object is a person and wherein the identifying features include one or more of:
    - prediction of a gender, an estimation of a pose, and an indication of whether the person is carrying an object.

19. A system, comprising:
- a processor; and
  
  one or more adaptive resonance theory (ART) networks in communication with the processor when the system is in operation, the one or more ART networks having stored thereon instructions that upon execution by the processor at least cause the system to;
  
  receive a sequence of video frames from a video camera;
  
  receive a request to view a scene depicted in the sequence of video frames;
  
  retrieve background image and one or more foreground images associated with the scene;
  
  identify and tracking at least some of the one or more foreground images between separate frames of the sequence of video frames;
  
  classify each tracked foreground image based on a known category of objects;
  
  generate a stream of context events associated with each tracked foreground image;
  
  generate a sequence of primitive events based on the stream of context events;
  
  store the stream of context events and the sequence of primitive events in the one or more ART networks;
  
  store detailed data in the one or more ART networks related to an event based on the stream of context events and the sequence of primitive events;
  
  store generalized data in the one or more ART networks related to an event based on the stream of context events and the sequence of primitive events; and
  
  evaluate the stream of context events, the sequence of primitive events, the detailed data, and the generalized data with the one or more ART networks to learn patterns of behavior that occur within the scene.
- View Dependent Claims (20)
- - 20. The system of claim 19, wherein the stream of context events includes a collection of kinematic information related to the at least one object.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Motorola Solutions, Inc.
Original Assignee
Avigilon Patent Holding 1 Corporation
Inventors
Cobb, Wesley Kenneth, Seow, Ming-Jung, Yang, Tao
Primary Examiner(s)
Bali, Vikkram

Application Number

US15/019,759
Publication Number

US 20160163065A1
Time in Patent Office

812 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 18/23   Clustering techniques

G06F 18/23211   with adaptive number of clu...

G06T 2207/10016   Video; Image sequence

G06T 2207/20084   Artificial neural networks ...

G06T 2207/30196   Human being; Person

G06T 2207/30232   Surveillance

G06T 7/20   Analysis of motion motion e...

G06T 7/254   involving subtraction of im...

G06V 10/763   Non-hierarchical techniques...

G06V 20/41   Higher-level, semantic clus...

G06V 20/44   Event detection

G06V 20/52   Surveillance or monitoring ...

G06V 40/20   Movements or behaviour, e.g...

Background model for complex and dynamic scenes

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

57 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Background model for complex and dynamic scenes

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

57 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links