Automatic scene calibration method for video analytics

US 10,372,970 B2
Filed: 09/15/2016
Issued: 08/06/2019
Est. Priority Date: 09/15/2016
Status: Active Grant

First Claim

Patent Images

1. A method for automated scene calibration, comprising:

determining a blob from a current video frame;

identifying the blob as associated with an object, the blob including pixels that represent at least a portion of the object;

determining, using the blob, a ground plane for the current video frame, wherein the ground plane represents a surface upon which the object is positioned;

selecting approximate three-dimensional points on the ground plane;

estimating extrinsic parameters for a camera model;

determining, using the camera model and the estimated extrinsic parameters, two-dimensional coordinates within the current video frame that correspond to the approximate three-dimensional points; and

determining, using the two-dimensional coordinates and the ground plane, values for a homographic matrix, wherein a homographic transformation using the values for the homographic matrix provides a mapping from the two-dimensional coordinates in the current video frame to three-dimensional real-world points.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

To determine real-world information about objects moving in a scene, the camera capturing the scene is typically calibrated to the scene. Automatic scene calibration can be accomplished using people that are found moving about in the scene. During a calibration period, a video content analysis system processing video frames from a camera can identify blobs that are associated with people. Using an estimated height of a typical person, the video content analysis system can use the location of the person'"'"'s head and feet to determine a mapping between the person'"'"'s location in the 2-D video frame and the person'"'"'s location in the 3-D real world. This mapping can be used to determine a cost for estimated extrinsic parameters for the camera. Using a hierarchical global estimation mechanism, the video content analysis system can determine the estimated extrinsic parameters with the lowest cost.

Citations

30 Claims

1. A method for automated scene calibration, comprising:
- determining a blob from a current video frame;
  
  identifying the blob as associated with an object, the blob including pixels that represent at least a portion of the object;
  
  determining, using the blob, a ground plane for the current video frame, wherein the ground plane represents a surface upon which the object is positioned;
  
  selecting approximate three-dimensional points on the ground plane;
  
  estimating extrinsic parameters for a camera model;
  
  determining, using the camera model and the estimated extrinsic parameters, two-dimensional coordinates within the current video frame that correspond to the approximate three-dimensional points; and
  
  determining, using the two-dimensional coordinates and the ground plane, values for a homographic matrix, wherein a homographic transformation using the values for the homographic matrix provides a mapping from the two-dimensional coordinates in the current video frame to three-dimensional real-world points.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 29, 30)
- - 2. The method of claim 1, wherein determining the two-dimensional coordinates includes using virtual intrinsic parameters, wherein the virtual intrinsic parameters include at least a focal length and an optical center.
  - 3. The method of claim 1, wherein at least the portion of the object is less than a whole of the object, wherein the object includes a person, wherein the pixels included in the blob include at least an upper body of the person, and wherein determining the ground plane includes using an estimated height of the person to locate an approximate position of one or both feet of the person.
  - 4. The method of claim 1, wherein at least the portion of the object is less than a whole of the object, wherein the object includes a person, wherein the pixels included in the blob include at least a face of the person, and wherein determining the ground plane includes using an estimated distance between eyes of the person and an estimated height of the person to locate an approximate position of one or both feet of the person.
  - 5. The method of claim 1, further comprising:
    - using random sample consensus to modify the estimated extrinsic parameters.
  - 6. The method of claim 1, further comprising:
    - determining, using a cost function, a cost value for the estimated extrinsic parameters, wherein determining the cost value includes;
      
      determining an estimated height of an object in the current video frame using the estimated extrinsic parameters;
      
      determining a detected height of the object using coordinates of the object within the current video frame; and
      
      comparing the estimated height and the detected height using the cost function.
  - 7. The method of claim 6, wherein determining the estimated height includes:
    - determining, using the homographic matrix, a three-dimensional point for two-dimensional coordinates of a bottom the object, wherein the two-dimensional coordinates are within the current video frame; and
      
      determining two-dimensional coordinates of a top of the object using the camera model and an estimated real-world height of the object.
  - 8. The method of claim 6, further comprising:
    - determining a plurality cost values for a plurality of extrinsic parameters, the plurality of cost values including the cost value; and
      
      identifying from the plurality of cost values a set of extrinsic parameters with a lowest cost value.
  - 9. The method of claim 6, wherein the cost function is a size-pose-based cost function.
  - 29. The method of claim 1, wherein the extrinsic parameters include one or more of a translation and a rotation of a camera.
  - 30. The method of claim 1, wherein the homographic matrix is used for testing accuracies of one or more sets of extrinsic parameters.

10. An apparatus, comprising:
- a memory configured to store video data; and
  
  a processor configured to;
  
  determine a blob from a current video frame;
  
  identify the blob as associated with an object, the blob including pixels that represent at least a portion of the object;
  
  determine, using the blob, a ground plane for the current video frame, wherein the ground plane represents a surface upon which the object is positioned;
  
  select approximate three-dimensional points on the ground plane;
  
  estimate extrinsic parameters for a camera model;
  
  determine, using the camera model and the estimated extrinsic parameters, two-dimensional coordinates within the current video frame that correspond to the approximate three-dimensional points; and
  
  determine, using the two-dimensional coordinates and the ground plane, values for a homographic matrix, wherein a homographic transformation using the values for the homographic matrix provides a mapping from the two-dimensional coordinates in the current video frame to three-dimensional real-world points.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 11. The apparatus of claim 10, wherein the camera model provides a mapping from three-dimensional real-world points to two-dimensional coordinates in the current video frame.
  - 12. The apparatus of claim 10, wherein homographic transformation provides a mapping from one coordinate system to another coordinate system.
  - 13. The apparatus of claim 10, wherein extrinsic parameters include at least three rotational parameters and two translational parameters.
  - 14. The apparatus of claim 10, wherein the processor is configured to determine the two-dimensional coordinates using virtual intrinsic parameters, wherein the virtual intrinsic parameters include at least a focal length and an optical center.
  - 15. The apparatus of claim 10, wherein at least the portion of the object is less than a whole of the object, wherein the object includes a person, wherein the pixels included in the blob include at least an upper body of the person, and wherein the processor is configured to determine the ground plane using an estimated height of the person to locate an approximate position of one or both feet of the person.
  - 16. The apparatus of claim 10, wherein at least the portion of the object is less than a whole of the object, wherein the object includes a person, wherein the pixels included in the blob include at least a face of the person, and wherein the processor is configured to determine the ground plane using an estimated distance between eyes of the person and an estimated height of the person to locate an approximate position of one or both feet of the person.
  - 17. The apparatus of claim 10, wherein the processor is further configured to:
    - use random sample consensus to modify the estimated extrinsic parameters.
  - 18. The apparatus of claim 10, wherein the processor is further configured to:
    - determine, using a cost function, a cost value for the estimated extrinsic parameters, wherein determining the cost value includes;
      
      determining an estimated height of an object in the current video frame using the estimated extrinsic parameters;
      
      determining a detected height of the object using coordinates of the object within the current video frame; and
      
      comparing the estimated height and the detected height using the cost function.
  - 19. The apparatus of claim 18, wherein the processor is configured to determine the estimated height by:
    - determining, using the homographic matrix, a three-dimensional point for two-dimensional coordinates of a bottom the object, wherein the two-dimensional coordinates are within the current video frame; and
      
      determining two-dimensional coordinates of a top of the object using the camera model and an estimated real-world height of the object.
  - 20. The apparatus of claim 18, wherein the processor is further configured to:
    - determine a plurality cost values for a plurality of extrinsic parameters, the plurality of cost values including the cost value; and
      
      identify from the plurality of cost values a set of extrinsic parameters with a lowest cost value.
  - 21. The apparatus of claim 18, wherein the cost function is a size-pose-based cost function.
  - 22. The apparatus of claim 10, wherein the processor is further configured to:
    - using the estimated extrinsic parameters for tracking objects in a video.

23. A non-transitory computer-readable medium having stored thereon instructions that, when executed by one or more processors, cause the one or more processors to:
- determine a blob from a current video frame;
  
  identify the blob as associated with an object, the blob including pixels that represent at least a portion of the object;
  
  determine, using the blob, a ground plane for the current video frame, wherein the ground plane represents a surface upon which the object is positioned;
  
  select approximate three-dimensional points on the ground plane;
  
  estimate extrinsic parameters for a camera model;
  
  determine, using the camera model and the estimated extrinsic parameters, two-dimensional coordinates within the current video frame that correspond to the approximate three-dimensional points s; and
  
  determine, using the two-dimensional coordinates and the ground plane, values for a homographic matrix, wherein a homographic transformation using the values for the homographic matrix provides a mapping from the two-dimensional coordinates in the current video frame to three-dimensional real-world points.
- View Dependent Claims (24, 25, 26, 27, 28)
- - 24. The non-transitory computer-readable medium of claim 23, wherein determining the two-dimensional coordinates includes using virtual intrinsic parameters, wherein the virtual intrinsic parameters include at least a focal length and an optical center.
  - 25. The non-transitory computer-readable medium of claim 23, wherein at least the portion of the object is less than a whole of the object, wherein the object includes a person, wherein the pixels included in the blob include at least an upper body of the person, and wherein determining the ground plane includes using an estimated height of the person to locate an approximate position of one or both feet of the person.
  - 26. The non-transitory computer-readable medium of claim 23, wherein at least the portion of the object is less than a whole of the object, wherein the object includes a person, wherein the pixels included in the blob include at least a face of the person, and wherein determining the ground plane includes using an estimated distance between eyes of the person and an estimated height of the person to locate an approximate position of one or both feet of the person.
  - 27. The non-transitory computer-readable medium of claim 23, further comprising instructions that, when executed by the one or more processors, cause the one or more processors to:
    - determine, using a cost function, a cost value for the estimated extrinsic parameters, wherein determining the cost value includes;
      
      determine an estimated height of an object in the current video frame using the estimated extrinsic parameters;
      
      determine a detected height of the object using coordinates of the object within the current video frame; and
      
      compare the estimated height and the detected height using the cost function.
  - 28. The non-transitory computer-readable medium of claim 27, wherein determining the estimated height includes:
    - determining, using the homographic matrix, a three-dimensional point for two-dimensional coordinates of a bottom the object, wherein the two-dimensional coordinates are within the current video frame; and
      
      determining two-dimensional coordinates of a top of the object using the camera model and an estimated real-world height of the object.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Qualcomm, Inc.
Original Assignee
Qualcomm, Inc.
Inventors
Wang, Lei, Gao, Dashan, Ma, Lei, Chiu, Chinchuan
Primary Examiner(s)
Truong, Nguyen T

Application Number

US15/266,747
Publication Number

US 20180075593A1
Time in Patent Office

1,055 Days
Field of Search

348 46
US Class Current
CPC Class Codes

G06T 2207/30196   Human being; Person

G06T 2207/30201   Face

G06T 2207/30232   Surveillance

G06T 2207/30244   Camera pose

G06T 7/246   using feature-based methods...

G06T 7/70   Determining position or ori...

G06T 7/73   using feature-based methods

G06T 7/80   Analysis of captured images...

G06T 7/85   Stereo camera calibration

G06V 20/41   Higher-level, semantic clus...

G06V 20/52   Surveillance or monitoring ...

G06V 40/103   Static body considered as a...

G06V 40/165   using facial parts and geom...

G06V 40/171   Local features and componen...

H04N 13/261   with monoscopic-to-stereosc...

Automatic scene calibration method for video analytics

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic scene calibration method for video analytics

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links