Video-based detection of multiple object types under varying poses

US 8,620,026 B2
Filed: 04/13/2011
Issued: 12/31/2013
Est. Priority Date: 04/13/2011
Status: Expired due to Fees

First Claim

Patent Images

1. A method for object detection as a function of a motion direction attribute, the method comprising:

clustering training data set object images corresponding to object motion blobs into each of a plurality of motionlet sets as a function of similarity of their associated motion direction attributes, each of the motionlet sets comprising object image associated with similar motion direction attributes that are distinguished from the motion direction attributes of the object image blobs in others of the motionlet sets;

resizing the clustered motionlet pluralities of object images from their respective original aspect ratios into a same aspect ratio, wherein the motionlet object images may have different original respective aspect ratios;

learning motionlet detectors for each of the motionlet sets from features extracted from the resized training blobs and from sets of negative images of non-object image patches of the same aspect ratio obtained from background images;

applying a deformable sliding window to detect an object blob in an input video obtained by background modeling by varying at least one of a size, a shape and an aspect ratio of the sliding window to conform to a shape of the detected input video object blob;

extracting a motion direction of an underlying image patch of the detected input video object blob;

selecting at least one of the motionlet detectors that has a motion direction similar to the motion direction extracted from an underlying image patch of the input video object blob;

applying the selected at least one motionlet detector to the detected input video object blob;

determining that an object has been detected within the detected input video object blob and extracting semantic attributes of the underlying image patch of the input video object blob if a one of the selected and applied at least one motionlet detectors fires; and

storing the extracted semantic attributes of the underlying image patch of the input video object blob in a database for searching for the detected object as a function of its extracted semantic attributes.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Training data object images are clustered as a function of motion direction attributes and resized from respective original into same aspect ratios. Motionlet detectors are learned for each of the sets from features extracted from the resized object blobs. A deformable sliding window is applied to detect an object blob in input by varying window size, shape or aspect ratio to conform to a shape of the detected input video object blob. A motion direction of an underlying image patch of the detected input video object blob is extracted and motionlet detectors selected and applied that have similar motion directions. An object is thus detected within the detected blob and semantic attributes of an underlying image patch extracted if a motionlet detectors fires, the extracted semantic attributes available for use for searching for the detected object.

27 Citations

View as Search Results

20 Claims

1. A method for object detection as a function of a motion direction attribute, the method comprising:
- clustering training data set object images corresponding to object motion blobs into each of a plurality of motionlet sets as a function of similarity of their associated motion direction attributes, each of the motionlet sets comprising object image associated with similar motion direction attributes that are distinguished from the motion direction attributes of the object image blobs in others of the motionlet sets;
  
  resizing the clustered motionlet pluralities of object images from their respective original aspect ratios into a same aspect ratio, wherein the motionlet object images may have different original respective aspect ratios;
  
  learning motionlet detectors for each of the motionlet sets from features extracted from the resized training blobs and from sets of negative images of non-object image patches of the same aspect ratio obtained from background images;
  
  applying a deformable sliding window to detect an object blob in an input video obtained by background modeling by varying at least one of a size, a shape and an aspect ratio of the sliding window to conform to a shape of the detected input video object blob;
  
  extracting a motion direction of an underlying image patch of the detected input video object blob;
  
  selecting at least one of the motionlet detectors that has a motion direction similar to the motion direction extracted from an underlying image patch of the input video object blob;
  
  applying the selected at least one motionlet detector to the detected input video object blob;
  
  determining that an object has been detected within the detected input video object blob and extracting semantic attributes of the underlying image patch of the input video object blob if a one of the selected and applied at least one motionlet detectors fires; and
  
  storing the extracted semantic attributes of the underlying image patch of the input video object blob in a database for searching for the detected object as a function of its extracted semantic attributes.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the applying the deformable sliding window to detect the object blob in the input video obtained by background modeling comprises:
    - varying the shape of the deformable sliding window to a first non-rectangular curved shape to conform to a first curved shape of the detected input video object blob; and
      
      varying the shape of the deformable sliding window to a second curved shape to conform to another curved shape of another detected input video object blob that is different from the first non-rectangular curved shape of the detected input video object blob.
  - 3. The method of claim 1, wherein the applying the deformable sliding window to detect the object blob in the input video obtained by background modeling comprises:
    - varying the aspect ratio of the deformable sliding window to a first aspect ratio to conform to a shape of the detected input video object blob; and
      
      varying the aspect ratio of the deformable sliding window to a second aspect ratio to conform to another shape of another detected input video object blob that is different from the shape of the detected input video object blob, the second aspect ratio different from the first aspect ratio.
  - 4. The method of claim 3, wherein the resizing the motionlet object images from their respective original aspect ratios into the same aspect ratio comprises resizing a first plurality of motionlet object images of different object types, each of the different object type images having different original respective aspect ratios;
    - wherein the learning the motionlet detectors for each of the motionlet sets comprises learning a first single motionlet detector for the first plurality of motionlet object images of different object types; and
      
      wherein the applying the selected at least one motionlet detector to the detected input video object blob comprises applying the first single motionlet detector to detected input video object blobs of each of the different object types to determine that any of the different object types has been detected within the detected input video object blob and extracting semantic attributes of the underlying image patch of the input video object blob if the first motionlet detector fires.
  - 5. The method of claim 4, wherein the extracted semantic attributes of the detected input video object blob further comprise a width and a height dimension in pixels for the underlying image patch of the detected input video object blob;
    - andwherein the method further comprises;
      
      calibrating a scene in an image field of view of the detected input video object blob; and
      
      estimating a width, height, and length of the detected object vehicles in world coordinates as a function of the extracted width and height dimensions in pixels and of the scene calibrating.
  - 6. The method of claim 4, wherein the extracting the motion direction of the underlying image patch of the detected input video object blob is an optical flow process.
  - 7. The method of claim 4, further comprising:
    - selecting the sets of negative images for learning the motionlet detectors from patches for which a current motionlet detector fails.

8. A system, comprising:
- a processing unit, computer readable memory and a computer readable storage medium;
  
  first program instructions to cluster training data set object images corresponding to object motion blobs into each of a plurality of motionlet sets as a function of similarity of their associated motion direction attributes, each of the motionlet sets comprising object image associated with similar motion direction attributes that are distinguished from the motion direction attributes of the object image blobs in others of the motionlet sets, the pluralities of the motionlet object images resized from their respective original aspect ratios into a same aspect ratio, wherein the motionlet object images may have different original respective aspect ratios;
  
  second program instructions to learn motionlet detectors for each of the motionlet sets from features extracted from the resized training blobs and from sets of negative images of non-object image patches of the same aspect ratio obtained from background images;
  
  third program instructions to apply a deformable sliding window to detect an object blob in an input video obtained by background modeling by varying at least one of a size, a shape and an aspect ratio of the sliding window to conform to a shape of the detected input video object blob, and to extract a motion direction of an underlying image patch of the detected input video object blob; and
  
  fourth program instructions to select at least one of the motionlet detectors that has a motion direction similar to the motion direction extracted from an underlying image patch of the input video object blob, apply the selected at least one motionlet detector to the detected input video object blob and determine that an object has been detected within the detected input video object blob and extract semantic attributes of the underlying image patch of the input video object blob if a one of the selected and applied at least one motionlet detectors fires, and to store the extracted semantic attributes of the underlying image patch of the input video object blob in a database for searching for the detected object as a function of its extracted semantic attributes;
  
  wherein the first, second, third and fourth program instructions are stored on the computer readable storage medium for execution by the processing unit via the computer readable memory.
- View Dependent Claims (9, 10, 11, 12)
- - 9. The system of claim 8, wherein the third program instructions are further to apply the deformable sliding window to detect the object blob in the input video obtained by background modeling by:
    - varying the shape of the deformable sliding window to a first non-rectangular curved shape to conform to a first curved shape of the detected input video object blob; and
      
      varying the shape of the deformable sliding window to a second curved shape to conform to another curved shape of another detected input video object blob that is different from the first non-rectangular curved shape of the detected input video object blob.
  - 10. The system of claim 8, wherein the third program instructions are further to apply the deformable sliding window to detect the object blob in the input video obtained by background modeling by:
    - varying the aspect ratio of the deformable sliding window to a first aspect ratio to conform to a shape of the detected input video object blob; and
      
      varying the aspect ratio of the deformable sliding window to a second aspect ratio to conform to another shape of another detected input video object blob that is different from the shape of the detected input video object blob, the second aspect ratio different from the first aspect ratio.
  - 11. The system of claim 10, wherein the first program instructions are further to resize the plurality of the motionlet object images from their respective original aspect ratios into the same aspect ratio by resizing a first plurality of motionlet object images of different object types, each of the different object type images having different original respective aspect ratios;
    - wherein the second program instructions are further to the learn the motionlet detectors for each of the motionlet sets by learning a first single motionlet detector for the first plurality of motionlet object images of different object types; and
      
      wherein the third program instructions are further to the apply the selected at least one motionlet detector to the detected input video object blob by applying the first single motionlet detector to detected input video object blobs of each of the different object types to determine that any of the different object types has been detected within the detected input video object blob and extracting semantic attributes of the underlying image patch of the input video object blob if the first motionlet detector fires.
  - 12. The system of claim 10, wherein the second program instructions are further to select the sets of negative images for learning the motionlet detectors from patches for which a current motionlet detector fails.

13. An article of manufacture, comprising:
- a computer readable storage hardware device having computer readable program code embodied therewith, the computer readable program code comprising instructions that, when executed by a computer processor, cause the computer processor to;
  
  cluster training data set object images corresponding to object motion blobs into each of a plurality of motionlet sets as a function of similarity of their associated motion direction attributes, each of the motionlet sets comprising object image associated with similar motion direction attributes that are distinguished from the motion direction attributes of the object image blobs in others of the motionlet sets, the pluralities of the motionlet object images resized from their respective original aspect ratios into a same aspect ratio, wherein the motionlet object images may have different original respective aspect ratios;
  
  learn motionlet detectors for each of the motionlet sets from features extracted from the resized training blobs and from sets of negative images of non-object image patches of the same aspect ratio obtained from background images;
  
  apply a deformable sliding window to detect an object blob in an input video obtained by background modeling by varying at least one of a size, a shape and an aspect ratio of the sliding window to conform to a shape of the detected input video object blob, and to extract a motion direction of an underlying image patch of the detected input video object blob; and
  
  select at least one of the motionlet detectors that has a motion direction similar to the motion direction extracted from an underlying image patch of the input video object blob, apply the selected at least one motionlet detector to the detected input video object blob;
  
  determine that an object has been detected within the detected input video object blob and extract semantic attributes of the underlying image patch of the input video object blob if a one of the selected and applied at least one motionlet detectors fires; and
  
  store the extracted semantic attributes of the underlying image patch of the input video object blob in a database for searching for the detected object as a function of its extracted semantic attributes.
- View Dependent Claims (14, 15, 16)
- - 14. The article of manufacture of claim 13, wherein the computer readable program code instructions, when executed by the computer processor, further cause the computer processor to apply the deformable sliding window to detect the object blob in the input video obtained by background modeling by:
    - varying the shape of the deformable sliding window to a first non-rectangular curved shape to conform to a first curved shape of the detected input video object blob; and
      
      varying the shape of the deformable sliding window to a second curved shape to conform to another curved shape of another detected input video object blob that is different from the first non-rectangular curved shape of the detected input video object blob.
  - 15. The article of manufacture of claim 13, wherein the computer readable program code instructions, when executed by the computer processor, further cause the computer processor to apply the deformable sliding window to detect the object blob in the input video obtained by background modeling by:
    - varying the aspect ratio of the deformable sliding window to a first aspect ratio to conform to a shape of the detected input video object blob; and
      
      varying the aspect ratio of the deformable sliding window to a second aspect ratio to conform to another shape of another detected input video object blob that is different from the shape of the detected input video object blob, the second aspect ratio different from the first aspect ratio.
  - 16. The article of manufacture of claim 15, wherein the computer readable program code instructions, when executed by the computer processor, further cause the computer processor to select the sets of negative images for learning the motionlet detectors from patches for which a current motionlet detector fails.

17. A method of providing a service for object detection as a function of a motion direction attribute, the method comprising providing:
- a motionlet splitter that clusters training data set object images corresponding to object motion blobs into each of a plurality of motionlet sets as a function of similarity of their associated motion direction attributes, each of the motionlet sets comprising object image associated with similar motion direction attributes that are distinguished from the motion direction attributes of the object image blobs in others of the motionlet sets;
  
  an aspect ratio resizer that resizes the clustered motionlet pluralities of object images from their respective original aspect ratios into a same aspect ratio, wherein the motionlet object images may have different original respective aspect ratios;
  
  a motionlet detector builder that builds motionlet detectors for each of the motionlet sets from features extracted from the resized training blobs and from sets of negative images of non-object image patches of the same aspect ratio obtained from background images;
  
  a sliding window applicator that detects an image blob in an input video and deforms a sliding window to frame about the detected blob in response to a shape of the detected blob by varying at least one of a size, a shape and an aspect ratio of the sliding window to conform to the shape of the detected blob; and
  
  a feature extractor that extracts a motion direction of an underlying image patch of the detected input video object blob, selects at least one of the motionlet detectors that has a motion direction similar to the motion direction extracted from an underlying image patch of the input video object blob, applies the selected at least one motionlet detector to the detected input video object blob, determines that an object has been detected within the detected input video object blob and extracts semantic attributes of the underlying image patch of the input video object blob if a one of the selected and applied at least one motionlet detectors fires, and stores the extracted semantic attributes of the underlying image patch of the input video object blob in a database for searching for the detected object as a function of its extracted semantic attributes.
- View Dependent Claims (18, 19, 20)
- - 18. The method of claim 17, wherein the sliding window applicator:
    - varies the shape of the deformable sliding window to a first non-rectangular curved shape to conform to a first curved shape of the detected input video object blob; and
      
      varies the shape of the deformable sliding window to a second curved shape to conform to another curved shape of another detected input video object blob that is different from the first non-rectangular curved shape of the detected input video object blob.
  - 19. The method of claim 17, wherein the sliding window applicator:
    - varies the aspect ratio of the deformable sliding window to a first aspect ratio to conform to a shape of the detected input video object blob; and
      
      varies the aspect ratio of the deformable sliding window to a second aspect ratio to conform to another shape of another detected input video object blob that is different from the shape of the detected input video object blob, the second aspect ratio different from the first aspect ratio.
  - 20. The method of claim 19, wherein the motionlet detector builder selects the sets of negative images for learning the motionlet detectors from patches for which a current motionlet detector fails.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kyndryl Incorporated
Original Assignee
International Business Machines Corporation
Inventors
Datta, Ankur, Feris, Rogerio S., Pankanti, Sharathchandra U., Siddiquie, Behjat, Zhai, Yun
Primary Examiner(s)
LE, BRIAN Q

Application Number

US13/085,547
Publication Number

US 20120263346A1
Time in Patent Office

993 Days
Field of Search

382/103, 382/104, 382/105, 382/106, 382/107
US Class Current

382/103
CPC Class Codes

G06V 10/44 Local feature extraction by...

G06V 20/47 Detecting features for summ...

Video-based detection of multiple object types under varying poses

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

27 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Video-based detection of multiple object types under varying poses

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

27 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links