Object retrieval in video data using complementary detectors

US 9,251,425 B2
Filed: 02/12/2015
Issued: 02/02/2016
Est. Priority Date: 06/28/2012
Status: Expired due to Fees

First Claim

Patent Images

1. A method for automatic object retrieval from input video based on learned detectors, the method comprising:

in response to a video stream input received from a fixed-camera surveillance video for analysis, a processing unit iteratively running different detectors of a plurality of pairs of complementary detectors in one each of subsequent frames of the surveillance video stream input;

collecting firings data for each of the run detectors per image frame location, until a threshold number of firings is reached by at least one of the run detectors; and

analyzing the frames from the surveillance video stream input to extract image attributes of vehicle objects by applying a subset of the run detectors that each reach the threshold number of firings in collecting the firings data for the image frame locations; and

wherein the detectors fire if an underlying vehicle image patch extracted from the motion blobs in a field of view of scene image data corresponds to image patches of the applied detectors.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Automatic object retrieval from input video is based on learned, complementary detectors created for each of a plurality of different motionlet clusters. The motionlet clusters are partitioned from a dataset of training vehicle images as a function of determining that vehicles within each of the scenes of the images in each cluster share similar two-dimensional motion direction attributes within their scenes. To train the complementary detectors, a first detector is trained on motion blobs of vehicle objects detected and collected within each of the training dataset vehicle images within the motionlet cluster via a background modeling process; a second detector is trained on each of the training dataset vehicle images within the motionlet cluster that have motion blobs of the vehicle objects but are misclassified by the first detector; and the training repeats until all of the training dataset vehicle images have been eliminated as false positives or correctly classified.

9 Citations

View as Search Results

20 Claims

1. A method for automatic object retrieval from input video based on learned detectors, the method comprising:
- in response to a video stream input received from a fixed-camera surveillance video for analysis, a processing unit iteratively running different detectors of a plurality of pairs of complementary detectors in one each of subsequent frames of the surveillance video stream input;
  
  collecting firings data for each of the run detectors per image frame location, until a threshold number of firings is reached by at least one of the run detectors; and
  
  analyzing the frames from the surveillance video stream input to extract image attributes of vehicle objects by applying a subset of the run detectors that each reach the threshold number of firings in collecting the firings data for the image frame locations; and
  
  wherein the detectors fire if an underlying vehicle image patch extracted from the motion blobs in a field of view of scene image data corresponds to image patches of the applied detectors.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further comprising:
    - creating the plurality of pairs of complementary detectors for each of a plurality of different motionlet clusters that are partitioned from a plurality of training dataset vehicle images as a function of determining that vehicles within each of scenes of the images in each cluster share similar two-dimensional motion direction attributes within their scene, by;
      
      training a first detector of each of the pairs of the complementary detectors on motion blobs of vehicle objects detected and collected within each of the training dataset vehicle images within one of the motionlet clusters via a background modeling process;
      
      training a second detector of each pair of the complementary detectors on each of the training dataset vehicle images within the one motionlet cluster that have motion blobs of the vehicle objects but are misclassified by the first detector of each said pair of the complementary detectors; and
      
      repeating the steps of training the first detector and the second detector until all of the training dataset vehicle images within the one motionlet cluster for each said pair of the complementary detectors have been eliminated as false positives or correctly classified by the first detector or the second detector.
  - 3. The method of claim 2, further comprising:
    - defining a detector map comprising convex envelope regions of locations of the firings that are collected within the surveillance video stream input frames by the subset of the complementary detectors that reach the threshold in collecting the firings data; and
      
      analyzing the surveillance video stream input frames by limiting the applying of the subset complementary detectors to the detector map convex envelope regions.
  - 4. The method of claim 3, further comprising:
    - automatically partitioning the training dataset images into the plurality of different motionlet clusters by;
      
      obtaining the two-dimensional motion direction attributes of the vehicle objects within the images through an optical flow process; and
      
      splitting the training set images into the plurality of motionlets as a function of similarity of their associated obtaining motion direction attributes.
  - 5. The method of claim 4, wherein the step of automatically partitioning the training dataset into the plurality of different clusters comprises:
    - determining an orientation of each of the vehicles within the scenes;
      
      determining a direction of travel of each of the vehicles within the scenes as a function of the determined orientations; and
      
      assigning each of the training set images to the different clusters as a function the determined directions of travel of the vehicles sharing a similar two-dimensional motion direction attribute within their respective scenes.
  - 6. The method of claim 4, wherein the steps of training the first detector and second detectors further comprise, until all images in the cluster have been correctly classified by the first and second detectors:
    - randomly sampling a set of positive samples for the motionlet cluster;
      
      tuning the first detector to have a maximum threshold of false alarms on the randomly sampled set of positive samples;
      
      selecting training set images that are misclassified by the first detector to train the second detector; and
      
      eliminating redundant ones of the set of positive samples that are explained by the first and second detectors.
  - 7. The method of claim 4, further comprising:
    - integrating computer-readable program code into a computer system comprising the processing unit, a computer readable memory and a computer readable storage medium, wherein the computer readable program code is embodied on the computer readable tangible storage medium and comprises instructions for execution by the processing unit via the computer readable memory that cause the processing unit to perform the steps of iteratively running the different detectors of the plurality of pairs of complementary detectors in one each of subsequent frames of the surveillance video stream input, collecting the firings data for each of the run detectors per image frame location until a threshold number of firings is reached by at least one of the run detectors, and analyzing the frames from the surveillance video stream input to extract the image attributes of vehicle objects by applying the subset of the run detectors that each reach the threshold number of firings in collecting the firings data for the image frame locations.

8. A system, comprising:
- a processing unit;
  
  a computer readable memory in circuit communication with the processing unit; and
  
  a computer-readable storage medium in circuit communication with the processing unit;
  
  wherein the processing unit executes program instructions stored on the computer-readable storage medium via the computer readable memory and thereby;
  
  iteratively runs different detectors of a plurality of pairs of complementary detectors in one each of subsequent frames of a surveillance video stream input received from a fixed-camera surveillance video for analysis;
  
  collects firings data for each of the run detectors per image frame location until a threshold number of firings is reached by at least one of the run detectors; and
  
  analyzes the frames from the surveillance video stream input to extract image attributes of vehicle objects by applying a subset of the run detectors that each reach the threshold number of firings in collecting the firings data for the image frame locations; and
  
  wherein the detectors fire if an underlying vehicle image patch extracted from the motion blobs in a field of view of scene image data corresponds to image patches of the applied detectors.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the processing unit executes the program instructions stored on the computer-readable storage medium via the computer readable memory and thereby further:
    - creates the plurality of pairs of complementary detectors for each of a plurality of different motionlet clusters that are partitioned from a plurality of training dataset vehicle images as a function of determining that vehicles within each of scenes of the images in each cluster share similar two-dimensional motion direction attributes within their scene, by;
      
      training a first detector of each of the pairs of the complementary detectors on motion blobs of vehicle objects detected and collected within each of the training dataset vehicle images within one of the motionlet clusters via a background modeling process;
      
      training a second detector of each pair of the complementary detectors on each of the training dataset vehicle images within the one motionlet cluster that have motion blobs of the vehicle objects but are misclassified by the first detector of each said pair of the complementary detectors; and
      
      repeating the steps of training the first detector and the second detector until all of the training dataset vehicle images within the one motionlet cluster for each said pair of the complementary detectors have been eliminated as false positives or correctly classified by the first detector or the second detector.
  - 10. The system of claim 9, wherein the processing unit executes the program instructions stored on the computer-readable storage medium via the computer readable memory and thereby further:
    - defines a detector map comprising convex envelope regions of locations of the firings that are collected within the surveillance video stream input frames by the subset complementary detectors that reach the threshold in collecting the firings data; and
      
      analyzes the surveillance video stream input frames by limiting the applying of the subset complementary detectors to the detector map convex envelope regions.
  - 11. The system of claim 10, wherein the processing unit executes the program instructions stored on the computer-readable storage medium via the computer readable memory and thereby further:
    - automatically partitions the training dataset images into the plurality of different motionlet clusters by;
      
      obtaining the two-dimensional motion direction attributes of the vehicle objects within the images through an optical flow process; and
      
      splitting the training set images into the plurality of motionlets as a function of similarity of their associated obtaining motion direction attributes.
  - 12. The system of claim 11, wherein the processing unit executes the program instructions stored on the computer-readable storage medium via the computer readable memory and thereby further automatically partitions the training dataset into the plurality of different clusters by:
    - determining an orientation of each of the vehicles within the scenes;
      
      determining a direction of travel of each of the vehicles within the scenes as a function of the determined orientations; and
      
      assigning each of the training set images to the different clusters as a function the determined directions of travel of the vehicles sharing a similar two-dimensional motion direction attribute within their respective scenes.
  - 13. The system of claim 11, wherein the processing unit executes the program instructions stored on the computer-readable storage medium via the computer readable memory and thereby further trains the first detector and the second detector by, until all images in the cluster have been correctly classified by the first and second detectors:
    - randomly sampling a set of positive samples for the motionlet cluster;
      
      tuning the first detector to have a maximum threshold of false alarms on the randomly sampled set of positive samples;
      
      selecting training set images that are misclassified by the first detector to train the second detector; and
      
      eliminating redundant ones of the set of positive samples that are explained by the first and second detectors.
  - 14. The system of claim 11, wherein the input training dataset vehicle images have a 320-by-240 pixel resolution system, the computer readable memory comprises three gigabytes of random access memory, the system is a 2.3 gigahertz system, and the processing unit executes the program instructions stored on the computer-readable storage medium via the computer readable memory and thereby trains the first and the second detectors at an average rate of 125 frames per second.

15. A computer program product, comprising:
- a computer readable hardware storage device having computer readable program code embodied therewith, the computer readable program code comprising instructions for execution by a computer processing unit that cause the computer processing unit to;
  
  iteratively run different detectors of a plurality of pairs of complementary detectors in one each of subsequent frames of a surveillance video stream input received from a fixed-camera surveillance video for analysis;
  
  collect firings data for each of the run detectors per image frame location until a threshold number of firings is reached by at least one of the run detectors; and
  
  analyze the frames from the surveillance video stream input to extract image attributes of vehicle objects by applying a subset of the run detectors that each reach the threshold number of firings in collecting the firings data for the image frame locations; and
  
  wherein the detectors fire if an underlying vehicle image patch extracted from the motion blobs in a field of view of scene image data corresponds to image patches of the applied detectors.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer program product of claim 15, wherein the computer readable program code instructions for execution by the computer processing unit further cause the computer processing unit to:
    - create the plurality of pairs of complementary detectors for each of a plurality of different motionlet clusters that are partitioned from a plurality of training dataset vehicle images as a function of determining that vehicles within each of scenes of the images in each cluster share similar two-dimensional motion direction attributes within their scene, by;
      
      training a first detector of each of the pairs of the complementary detectors on motion blobs of vehicle objects detected and collected within each of the training dataset vehicle images within one of the motionlet clusters via a background modeling process;
      
      training a second detector of each pair of the complementary detectors on each of the training dataset vehicle images within the one motionlet cluster that have motion blobs of the vehicle objects but are misclassified by the first detector of each said pair of the complementary detectors; and
      
      repeating the steps of training the first detector and the second detector until all of the training dataset vehicle images within the one motionlet cluster for each said pair of the complementary detectors have been eliminated as false positives or correctly classified by the first detector or the second detector.
  - 17. The computer program product of claim 16, wherein the computer readable program code instructions for execution by the computer processing unit further cause the computer processing unit to:
    - define a detector map comprising convex envelope regions of locations of the firings that are collected within the surveillance video stream input frames by the subset complementary detectors that reach the threshold in collecting the firings data; and
      
      analyze the surveillance video stream input frames by limiting the applying of the subset complementary detectors to the detector map convex envelope regions.
  - 18. The computer program product of claim 17, wherein the computer readable program code instructions for execution by the computer processing unit further cause the computer processing unit to:
    - automatically partition the training dataset images into the plurality of different motionlet clusters by;
      
      obtaining the two-dimensional motion direction attributes of the vehicle objects within the images through an optical flow process; and
      
      splitting the training set images into the plurality of motionlets as a function of similarity of their associated obtaining motion direction attributes.
  - 19. The computer program product of claim 18, wherein the computer readable program code instructions for execution by the computer processing unit further cause the computer processing unit to automatically partition the training dataset into the plurality of different clusters by:
    - determining an orientation of each of the vehicles within the scenes;
      
      determining a direction of travel of each of the vehicles within the scenes as a function of the determined orientations; and
      
      assigning each of the training set images to the different clusters as a function the determined directions of travel of the vehicles sharing a similar two-dimensional motion direction attribute within their respective scenes.
  - 20. The computer program product of claim 18, wherein the computer readable program code instructions for execution by the computer processing unit further cause the computer processing unit to train the first detector and the second detector by, until all images in the cluster have been correctly classified by the first and second detectors:
    - randomly sampling a set of positive samples for the motionlet cluster;
      
      tuning the first detector to have a maximum threshold of false alarms on the randomly sampled set of positive samples;
      
      selecting training set images that are misclassified by the first detector to train the second detector; and
      
      eliminating redundant ones of the set of positive samples that are explained by the first and second detectors.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Datta, Ankur, Feris, Rogerio S., Pankanti, Sharathchandra U., Zhai, Yun
Primary Examiner(s)
LE, BRIAN Q

Application Number

US14/620,510
Publication Number

US 20150154457A1
Time in Patent Office

355 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 18/214   Generating training pattern...

G06F 18/217   Validation; Performance eva...

G06F 18/22   Matching criteria, e.g. pro...

G06F 18/23   Clustering techniques

G06V 10/774   Generating sets of training...

G06V 20/41   Higher-level, semantic clus...

G06V 20/44   Event detection

G06V 20/48   Matching video sequences

G06V 20/52   Surveillance or monitoring ...

Object retrieval in video data using complementary detectors

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

9 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Object retrieval in video data using complementary detectors

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

9 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links