Real-time object analysis with occlusion handling

US 9,582,895 B2
Filed: 05/22/2015
Issued: 02/28/2017
Est. Priority Date: 05/22/2015
Status: Expired due to Fees

First Claim

Patent Images

1. A method, comprising the steps of:

receiving a video sequence comprising detection results from one or more detectors, the detection results identifying one or more objects;

applying a clustering process to the detection results to identify one or more clusters associated with the one or more objects, wherein the clustering process is applied to the video sequence on a frame-by-frame basis, and wherein applying the clustering process comprises;

detecting one or more non-associated detections, wherein the one or more non-associated detections have not been assigned to an existing cluster;

applying a non-maximum suppression process to the one or more non-associated detections to generate one or more results;

evaluating the one or more results with a confidence score for the one or more non-associated detections, wherein the confidence score comprises a depth-height map and a detection frequency; and

instantiating a new cluster or eliminating one or more existing clusters based on the confidence score;

determining spatial and temporal information for each of the one or more clusters;

associating the one or more clusters to the detection results based on the spatial and temporal information in consecutive frames of the video sequence to generate tracking information;

generating one or more target tracks based on the tracking information for the one or more clusters; and

consolidating the one or more target tracks to generate refined tracks for the one or more objects;

wherein the steps are performed by at least one processor device coupled to a memory.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method includes the following steps. A video sequence including detection results from one or more detectors is received, the detection results identifying one or more objects. A clustering framework is applied to the detection results to identify one or more clusters associated with the one or more objects. The clustering framework is applied to the video sequence on a frame-by-frame basis. Spatial and temporal information for each of the one or more clusters are determined. The one or more clusters are associated to the detection results based on the spatial and temporal information in consecutive frames of the video sequence to generate tracking information. One or more target tracks are generated based on the tracking information for the one or more clusters. The one or more target tracks are consolidated to generate refined tracks for the one or more objects.

Citations

20 Claims

1. A method, comprising the steps of:
- receiving a video sequence comprising detection results from one or more detectors, the detection results identifying one or more objects;
  
  applying a clustering process to the detection results to identify one or more clusters associated with the one or more objects, wherein the clustering process is applied to the video sequence on a frame-by-frame basis, and wherein applying the clustering process comprises;
  
  detecting one or more non-associated detections, wherein the one or more non-associated detections have not been assigned to an existing cluster;
  
  applying a non-maximum suppression process to the one or more non-associated detections to generate one or more results;
  
  evaluating the one or more results with a confidence score for the one or more non-associated detections, wherein the confidence score comprises a depth-height map and a detection frequency; and
  
  instantiating a new cluster or eliminating one or more existing clusters based on the confidence score;
  
  determining spatial and temporal information for each of the one or more clusters;
  
  associating the one or more clusters to the detection results based on the spatial and temporal information in consecutive frames of the video sequence to generate tracking information;
  
  generating one or more target tracks based on the tracking information for the one or more clusters; and
  
  consolidating the one or more target tracks to generate refined tracks for the one or more objects;
  
  wherein the steps are performed by at least one processor device coupled to a memory.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, wherein the depth-height map represents a probability of a correct detection according to a relevancy of detection size and position in the frame.
  - 3. The method of claim 1, wherein the new cluster is instantiated when the confidence score exceeds a threshold.
  - 4. The method of claim 1, wherein one or more clusters are eliminated when the one or more clusters are void of objects for a period of time.
  - 5. The method of claim 1, wherein the detection results comprise one or more bounding boxes identifying one or more of the objects.
  - 6. The method of claim 1, wherein at least one of the one or more objects is at least partially occluded by another object.
  - 7. The method of claim 6, wherein the at least one partially occluded object is associated to at least one cluster based on the spatial and temporal information.
  - 8. The method of claim 1, wherein consolidating the one or more target tracks comprises calculating a distinct confidence score for each of the one or more target tracks.
  - 9. The method of claim 8, wherein consolidating the one or more target tracks comprises removing one or more target tracks based on the calculated distinct confidence score.
  - 10. The method of claim 8, wherein consolidating the one or more target tracks comprises completing one or more incomplete target tracks based on the calculated distinct confidence score.
  - 11. The method of claim 1, wherein consolidating the one or more target tracks comprises joining two or more target tracks based on a correspondence loss function.
  - 12. The method of claim 1, wherein associating the one or more clusters to the detection results further comprises associating one or more previous bounding boxes to one or more new bounding boxes.
  - 13. The method of claim 12, further comprising updating a previous status associated with the one or more clusters by associating the one or more new bounding boxes to the cluster.
  - 14. The method of claim 13, wherein the previous status takes into account at least one of a previous motion and a previous scale of the cluster.

15. An apparatus, comprising:
- a memory; and
  
  a processor operatively coupled to the memory and configured to;
  
  receive a video sequence comprising detection results from one or more detectors, the detection results identifying one or more objects;
  
  apply a clustering process to the detection results to identify one or more clusters associated with the one or more objects, wherein the clustering process is applied to the video sequence on a frame-by-frame basis, and wherein the application of the clustering process comprises;
  
  a detection one or more non-associated detections, wherein the one or more non-associated detections have not been assigned to an existing cluster;
  
  an application of a non-maximum suppression process to the one or more non-associated detections to generate one or more results;
  
  an evaluation of the one or more results with a confidence score for the one or more non-associated detections, wherein the confidence score comprises a depth-height map and a detection frequency; and
  
  instantiate a new cluster or eliminate one or more existing clusters based on the confidence score;
  
  determine spatial and temporal information for each of the one or more clusters;
  
  associate the one or more clusters to the detection results based on the spatial and temporal information in consecutive frames of the video sequence to generate tracking information;
  
  generate one or more target tracks based on the tracking information for the one or more clusters; and
  
  consolidate the one or more target tracks to generate refined tracks for the one or more objects.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The apparatus of claim 15, wherein the new cluster is instantiated when the confidence score exceeds a threshold.
  - 17. The apparatus of claim 15, wherein at least one of the one or more objects are at least partially occluded by another object.
  - 18. The apparatus of claim 17, wherein the at least one partially occluded object is associated to at least one cluster based on the spatial and temporal information.
  - 19. The apparatus of claim 15, wherein consolidating the one or more target tracks comprises calculating a distinct confidence score for each of the one or more target tracks.

20. An article of manufacture comprising a computer readable storage medium for storing computer readable program code which, when executed, causes a computer to:
- receive a video sequence comprising detection results from one or more detectors, the detection results identifying one or more objects;
  
  apply a clustering process to the detection results to identify one or more clusters associated with the one or more objects, wherein the clustering process is applied to the video sequence on a frame-by-frame basis, and wherein the application of the clustering process comprises;
  
  a detection one or more non-associated detections, wherein the one or more non-associated detections have not been assigned to an existing cluster;
  
  an application of a non-maximum suppression process to the one or more non-associated detections to generate one or more results;
  
  an evaluation the one or more results with a confidence score for the one or more non-associated detections, wherein the confidence score comprises a depth-height map and a detection frequency; and
  
  instantiate a new cluster or eliminate one or more existing clusters based on the confidence score; and
  
  eliminate one or more existing clusters;
  
  determine spatial and temporal information for each of the one or more clusters;
  
  associate the one or more clusters to the detection results based on the spatial and temporal information in consecutive frames of the video sequence to generate tracking information;
  
  generate one or more target tracks based on the tracking information for the one or more clusters; and
  
  consolidate the one or more target tracks to generate refined tracks for the one or more objects.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation, The University of Queensland
Original Assignee
International Business Machines Corporation, The University of Queensland
Inventors
Brown, Lisa M., Emami, Sayed Ali, Harandi, Mehrtash, Pankanti, Sharathchandra U.
Primary Examiner(s)
TSAI, TSUNG YIN

Application Number

US14/719,875
Publication Number

US 20160343146A1
Time in Patent Office

648 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G06F 18/23   Clustering techniques

G06T 2207/10016   Video; Image sequence

G06T 2207/20076   Probabilistic image processing

G06T 2207/30196   Human being; Person

G06T 7/246   using feature-based methods...

G06V 20/53   Recognition of crowd images...

G06V 40/10   Human or animal bodies, e.g...

Real-time object analysis with occlusion handling

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Real-time object analysis with occlusion handling

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links