Mapper component for multiple art networks in a video analysis system

US 8,416,296 B2
Filed: 04/14/2009
Issued: 04/09/2013
Est. Priority Date: 04/14/2009
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented method for analyzing a sequence of video frames depicting a scene captured by a video camera, the method comprising:

receiving one or more data streams generated from the sequence of video frames, wherein a first one of the data streams provides a stream of context events generated by a computer vision engine, and wherein each context event provides kinematic data related to a foreground object observed by the computer vision engine in the sequence of video frames;

parsing the data streams to identify data inputs matching an input layer of one of a plurality of adaptive resonance theory (ART) networks, wherein each ART network is configured to generate clusters from the data inputs matching the input layer of a respective ART network, and wherein each cluster provides a statistical distribution of a characteristic of the scene derived from the data streams that has been observed to occur at a location in the scene corresponding to a location of the cluster;

passing the data inputs to the ART network with the matching input layer;

updating the generated clusters in the ART network with the matching input layer; and

evaluating the clusters of the ART network with the matching input layer to determine whether the data inputs passed to the ART network are indicative of an occurrence of a statistically relevant event, relative to the clusters in the ART network with the matching input layer; and

upon determining that a first one of the clusters in a first one of the plurality of ART networks has not been updated for a specified period of time, removing the first cluster from the first ART network.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques are disclosed for detecting the occurrence of unusual events in a sequence of video frames Importantly, what is determined as unusual need not be defined in advance, but can be determined over time by observing a stream of primitive events and a stream of context events. A mapper component may be configured to parse the event streams and supply input data sets to multiple adaptive resonance theory (ART) networks. Each individual ART network may generate clusters from the set of inputs data supplied to that ART network. Each cluster represents an observed statistical distribution of a particular thing or event being observed that ART network.

61 Citations

View as Search Results

21 Claims

1. A computer-implemented method for analyzing a sequence of video frames depicting a scene captured by a video camera, the method comprising:
- receiving one or more data streams generated from the sequence of video frames, wherein a first one of the data streams provides a stream of context events generated by a computer vision engine, and wherein each context event provides kinematic data related to a foreground object observed by the computer vision engine in the sequence of video frames;
  
  parsing the data streams to identify data inputs matching an input layer of one of a plurality of adaptive resonance theory (ART) networks, wherein each ART network is configured to generate clusters from the data inputs matching the input layer of a respective ART network, and wherein each cluster provides a statistical distribution of a characteristic of the scene derived from the data streams that has been observed to occur at a location in the scene corresponding to a location of the cluster;
  
  passing the data inputs to the ART network with the matching input layer;
  
  updating the generated clusters in the ART network with the matching input layer; and
  
  evaluating the clusters of the ART network with the matching input layer to determine whether the data inputs passed to the ART network are indicative of an occurrence of a statistically relevant event, relative to the clusters in the ART network with the matching input layer; and
  
  upon determining that a first one of the clusters in a first one of the plurality of ART networks has not been updated for a specified period of time, removing the first cluster from the first ART network.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The computer-implemented method of claim 1, further comprising, in response to determining that the data inputs passed to the ART network are indicative of the occurrence of the statistically relevant event, publishing an alert message.
  - 3. The computer-implemented method of claim 1 wherein the statistically relevant event is one of the creation of a new cluster in response to passing the data inputs to the ART network with the matching input layer and a mapping, by the ART network with the matching input layer, of the data inputs to a cluster of low significance, relative to other clusters in the ART network.
  - 4. The computer-implemented method of claim 1, wherein one of the data streams is a stream of primitive events generated by a machine learning engine, and wherein each primitive event provides a semantic description of a group of one or more context events.
  - 5. The computer-implemented method of claim 1, wherein one or more of the context events provide a classification of what is depicted by a foreground object detected in the scene by the computer vision engine.
  - 6. The computer-implemented method of claim 5, wherein the classification classifies the detected foreground object as depicting one of a person, a vehicle, or an unknown, or an other class of foreground object.
  - 7. The computer-implemented method of claim 1, wherein the kinematic data includes at least one of a coordinate position in a frame of video where the characteristic is observed to occur, and wherein the characteristic is one of an appearance of a foreground object, a disappearance of a foreground object, a height of a foreground object, a width of a foreground object, a velocity in a horizontal dimension of the foreground object, a velocity of a foreground object in a vertical dimension, a rate of acceleration of a foreground object in a horizontal dimension and a rate of acceleration of a foreground object in a vertical dimension.
  - 8. The computer-implemented method of claim 1, further comprising, merging two or more overlapping clusters in the ART network with the matching input layer.
  - 9. The computer-implemented method of claim 1, wherein updating the generated clusters in the ART network with the matching input layer comprises one of:
    - (i) generating a new cluster at an initial position determined form the passed data inputs, wherein the new cluster includes an initial mean and a variance, and wherein the new cluster is bounded by a specified distance from the initial position for each dimension of data passed to the input layer;
      
      (ii) updating a previously generated cluster by updating the position, mean and variance of the previously generated cluster.

10. A non-transitory computer-readable medium containing a program which, when executed by a processor, performs an operation for analyzing a sequence of video frames depicting a scene captured by a video camera, the operation comprising:
- receiving one or more data streams generated from the sequence of video frames, wherein a first one of the data streams provides a stream of context events generated by a computer vision engine, and wherein each context event provides kinematic data related to a foreground object observed by the computer vision engine in the sequence of video frames;
  
  parsing the data streams to identify data inputs matching an input layer of one of a plurality of adaptive resonance theory (ART) networks, wherein each ART network is configured to generate clusters from the data inputs matching the input layer of a respective ART network, and wherein each cluster provides a statistical distribution of a characteristic of the scene derived from the data streams that has been observed to occur at a location in the scene corresponding to a location of the cluster;
  
  passing the data inputs to the ART network with the matching input layer;
  
  updating the generated clusters in the ART network with the matching input layer;
  
  evaluating the clusters of the ART network with the matching input layer to determine whether the data inputs passed to the ART network are indicative of an occurrence of a statistically relevant event, relative to the clusters in the ART network with the matching input layer; and
  
  upon determining that a first one of the clusters in a first one of the plurality of ART networks has not been updated for a specified period of time, removing the first cluster from the first ART network.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The non-transitory computer-readable medium of claim 10, wherein the operation further comprises, in response to determining that the data inputs passed to the ART network are indicative of the occurrence of the statistically relevant event, publishing an alert message.
  - 12. The non-transitory computer-readable medium of claim 10, wherein the statistically relevant event is one of the creation of a new cluster in response to passing the data inputs to the ART network with the matching input layer and a mapping, by the ART network with the matching input layer, of the data inputs to a cluster of low significance, relative to other clusters in the ART network.
  - 13. The non-transitory computer-readable medium of claim 10, wherein one of the data streams is a stream of primitive events generated by a machine learning engine, and wherein each primitive event provides a semantic description of a group of one or more context events.
  - 14. The non-transitory computer-readable medium of claim 10, wherein one or more of the context events provide a classification of what is depicted by a foreground object detected in the scene by the computer vision engine.
  - 15. The non-transitory computer-readable medium of claim 10, wherein the kinematic data includes at least one of a coordinate position in a frame of video where the characteristic is observed to occur, and wherein the characteristic is one of an appearance of a foreground object, a disappearance of a foreground object, a height of a foreground object, a width of a foreground object, a velocity in a horizontal dimension of the foreground object, a velocity of a foreground object in a vertical dimension, a rate of acceleration of a foreground object in a horizontal dimension and a rate of acceleration of a foreground object in a vertical dimension.
  - 16. The non-transitory computer-readable medium of claim 10, wherein the operation further comprises, merging two or more overlapping clusters in the ART network with the matching input layer.

17. A system, comprising:
- a video input source configured to provide a sequence of video frames, each depicting a scene;
  
  a processor; and
  
  a memory containing a program, which, when executed on the processor is configured to perform an operation for analyzing the scene, as depicted by the sequence of video frames captured by the video input source, the operation comprising;
  
  receiving one or more data streams generated from the sequence of video frames, wherein a first one of the data streams provides a stream of context events generated by a computer vision engine, and wherein each context event provides kinematic data related to a foreground object observed by the computer vision engine in the sequence of video frames,parsing the data streams to identify data inputs matching an input layer of one of a plurality of adaptive resonance theory (ART) networks, wherein each ART network is configured to generate clusters from the data inputs matching the input layer of a respective ART network, and wherein each cluster provides a statistical distribution of a characteristic of the scene derived from the data streams that has been observed to occur at a location in the scene corresponding to a location of the cluster,passing the data inputs to the ART network with the matching input layer,updating the generated clusters in the ART network with the matching input layer,evaluating the clusters of the ART network with the matching input layer to determine whether the data inputs passed to the ART network are indicative of an occurrence of a statistically relevant event, relative to the clusters in the ART network with the matching input layer, andupon determining that a first one of the clusters in a first one of the plurality of ART networks has not been updated for a specified period of time, removing the first cluster from the first ART network.
- View Dependent Claims (18, 19, 20, 21)
- - 18. The system of claim 17, wherein the operation further comprises, in response to determining that the data inputs passed to the ART network are indicative of the occurrence of the statistically relevant event, publishing an alert message.
  - 19. The system of claim 17, wherein the statistically relevant event is one of the creation of a new cluster in response to passing the data inputs to the ART network with the matching input layer and a mapping, by the ART network with the matching input layer, of the data inputs to a cluster of low significance, relative to other clusters in the ART network.
  - 20. The system of claim 17, wherein one of the data streams is a stream of primitive events generated by a machine learning engine, and wherein each primitive event provides a semantic description of a group of one or more context events.
  - 21. The system of claim 17, wherein the kinematic data includes at least one of a coordinate position in a frame of video where the characteristic is observed to occur, and wherein the characteristic is one of an appearance of a foreground object, a disappearance of a foreground object, a height of a foreground object, a width of a foreground object, a velocity in a horizontal dimension of the foreground object, a velocity of a foreground object in a vertical dimension, a rate of acceleration of a foreground object in a horizontal dimension and a rate of acceleration of a foreground object in a vertical dimension.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Avigilon Patent Holding 1 Corporation
Original Assignee
Behavioral Recognition Systems Incorporated
Inventors
Cobb, Wesley Kenneth, Seow, Ming-Jung
Primary Examiner(s)
Burgess, Glenton B
Assistant Examiner(s)
MARTINEZ, MICHAEL T

Application Number

US12/423,650
Publication Number

US 20100260376A1
Time in Patent Office

1,456 Days
Field of Search

382/100, 382/103, 382/133, 382/158, 382/187, 382/190, 382/240, 382/275, 382/104, 382/291, 708/801, 706/12, 706/15, 706/20, 709/226, 348135-145, 348/148, 348154-155
US Class Current

348/143
CPC Class Codes

G06V 20/52 Surveillance or monitoring ...

G06V 20/54 of traffic, e.g. cars on th...

Mapper component for multiple art networks in a video analysis system

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

61 Citations

21 Claims

Specification

Use Cases

Quick Links

Others

Mapper component for multiple art networks in a video analysis system

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

61 Citations

21 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others