Object detection and tracking

US 9,729,865 B1
Filed: 06/18/2014
Issued: 08/08/2017
Est. Priority Date: 06/18/2014
Status: Active Grant

First Claim

Patent Images

1. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause a computing device to:

generate first right image data during a first period of time with a right camera;

generate first left image data during the first period of time with a left camera, the right camera and the left camera having at least partially overlapping fields of view;

detect, using a face detection algorithm, a first right representation of a face in the first right image data, the face detection algorithm returning a first right bounding box for the face in the first right image data;

detect, using the face detection algorithm, a first left representation of the face in the first left image data, the face detection algorithm returning a first left bounding box for the face in the first left image data;

detect, using a feature extraction algorithm, a set of features of the face in the first right image data by analyzing the first right image data within the first right bounding box;

detect, using the feature extraction algorithm, the set of features of the face in the first left image data by analyzing the first left image data within the first left bounding box;

determine a first right position for a point relative to the set of features in the first right image data;

determine a first left position for the point relative to the set of features in the first left image data;

generate second right image data during a second period of time with the right camera;

generate second left image data during the second period of time with the left camera;

detect, using the face detection algorithm, a second right representation of the face in the second right image data, the face detection algorithm returning a second right bounding box for the face in the second right image data;

detect, using the face detection algorithm, a second left representation of the face in the second left image data, the face detection algorithm returning a second left bounding box for the face in the second left image data;

detect, using the feature extraction algorithm, the set of features of the face in the second right image data by analyzing the second right image data within the second right bounding box;

detect, using the feature extraction algorithm, the set of features of the face in the first left image data by analyzing the second left image data within the second left bounding box;

determine a second right position for the point relative to the set of features in the second right image data;

determine a second left position for the point relative to the set of features in the second left image data;

determine a right two-dimensional (2D) change in position of the point relative to the set of features between the first right image data and the second right image data;

determine a left 2D change in position of the point relative to the set of features between the first left image data and the second left image data;

determine, using a feature tracking algorithm and based at least in part on the right 2D change in position, a right 2D output for the point relative to the set of features;

determine, using the feature tracking algorithm and based at least in part on the left 2D change in position, a left 2D output for the point relative to the set of features;

determine stereo disparity of the point relative to the set of features between the right 2D output and the left 2D output in the at least partially overlapping fields of view of the right camera and the left camera;

determine a z-depth for the point relative to the set of features of the face using the stereo disparity and calibration information for the right camera and the left camera to determine a three-dimensional (3D) position for the point relative to the set of features;

generate third right image data during a third period of time with the right camera;

generate third left image data during the third period of time with the left camera;

determine that a third right representation of the face is detected in the third right image data;

determine that a third left representation of the face is not detected in the third left image data;

generate a template of the face using information for the face from the third right image data; and

use the template to detect the face in the third left image data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Various embodiments enable a primary user to be identified and tracked using stereo association and multiple tracking algorithms. For example, a face detection algorithm can be run on each image captured by a respective camera independently. Stereo association can be performed to match faces between cameras. If the faces are matched and a primary user is determined, a face pair is created and used as the first data point in memory for initializing object tracking. Further, features of a user'"'"'s face can be extracted and the change in position of these features between images can determine what tracking method will be used for that particular frame.

Citations

19 Claims

1. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause a computing device to:
- generate first right image data during a first period of time with a right camera;
  
  generate first left image data during the first period of time with a left camera, the right camera and the left camera having at least partially overlapping fields of view;
  
  detect, using a face detection algorithm, a first right representation of a face in the first right image data, the face detection algorithm returning a first right bounding box for the face in the first right image data;
  
  detect, using the face detection algorithm, a first left representation of the face in the first left image data, the face detection algorithm returning a first left bounding box for the face in the first left image data;
  
  detect, using a feature extraction algorithm, a set of features of the face in the first right image data by analyzing the first right image data within the first right bounding box;
  
  detect, using the feature extraction algorithm, the set of features of the face in the first left image data by analyzing the first left image data within the first left bounding box;
  
  determine a first right position for a point relative to the set of features in the first right image data;
  
  determine a first left position for the point relative to the set of features in the first left image data;
  
  generate second right image data during a second period of time with the right camera;
  
  generate second left image data during the second period of time with the left camera;
  
  detect, using the face detection algorithm, a second right representation of the face in the second right image data, the face detection algorithm returning a second right bounding box for the face in the second right image data;
  
  detect, using the face detection algorithm, a second left representation of the face in the second left image data, the face detection algorithm returning a second left bounding box for the face in the second left image data;
  
  detect, using the feature extraction algorithm, the set of features of the face in the second right image data by analyzing the second right image data within the second right bounding box;
  
  detect, using the feature extraction algorithm, the set of features of the face in the first left image data by analyzing the second left image data within the second left bounding box;
  
  determine a second right position for the point relative to the set of features in the second right image data;
  
  determine a second left position for the point relative to the set of features in the second left image data;
  
  determine a right two-dimensional (2D) change in position of the point relative to the set of features between the first right image data and the second right image data;
  
  determine a left 2D change in position of the point relative to the set of features between the first left image data and the second left image data;
  
  determine, using a feature tracking algorithm and based at least in part on the right 2D change in position, a right 2D output for the point relative to the set of features;
  
  determine, using the feature tracking algorithm and based at least in part on the left 2D change in position, a left 2D output for the point relative to the set of features;
  
  determine stereo disparity of the point relative to the set of features between the right 2D output and the left 2D output in the at least partially overlapping fields of view of the right camera and the left camera;
  
  determine a z-depth for the point relative to the set of features of the face using the stereo disparity and calibration information for the right camera and the left camera to determine a three-dimensional (3D) position for the point relative to the set of features;
  
  generate third right image data during a third period of time with the right camera;
  
  generate third left image data during the third period of time with the left camera;
  
  determine that a third right representation of the face is detected in the third right image data;
  
  determine that a third left representation of the face is not detected in the third left image data;
  
  generate a template of the face using information for the face from the third right image data; and
  
  use the template to detect the face in the third left image data.
- View Dependent Claims (2, 3, 4)
- - 2. The non-transitory computer-readable storage medium of claim 1, wherein the instructions that, when executed by the processor, further cause the computing device to:
    - determine the right 2D change in position being less than a first threshold;
      
      determine the left 2D change in position being less than a first threshold;
      
      determine, using the feature tracking algorithm, the first right position for the point relative to the set of features from the first right image data as the right 2D output; and
      
      determine, using the feature tracking algorithm, the first left position for the point relative to the set of features from the first left image data as the left 2D output.
  - 3. The non-transitory computer-readable storage medium of claim 1, wherein the instructions that, when executed by the processor, further cause the computing device to:
    - determine the right 2D change in position being between a first threshold and a second threshold;
      
      track, using a single point tracking algorithm, the point relative to the set of features of the face between the first right position and the second right position to determine the right 2D output;
      
      determine the left 2D change in position being between a first threshold and a second threshold; and
      
      track, using the single point tracking algorithm, the point relative to the set of features of the face between the first left position and the second left position to determine the left 2D output.
  - 4. The non-transitory computer-readable storage medium of claim 1, wherein the instructions that, when executed by the processor, further cause the computing device to:
    - determine the right 2D change in position being greater than a second threshold;
      
      determine the left 2D change in position being greater than the second threshold;
      
      determine, using the feature tracking algorithm, the second right position for the point relative to the set of features from the second right image data as the right 2D output; and
      
      determine, using the feature tracking algorithm, the second left position for the point relative to the set of features from the second left image data as the left 2D output.

5. A computer-implemented method, comprising:
- detecting an object in first stereo image data generated during a first period of time by two or more image capturing elements with overlapping fields of view;
  
  associating a bounding box with the object;
  
  analyzing the first stereo image data within the bounding box to determine one or more points relative to features of the object;
  
  tracking the one or more points in the first stereo image data to determine a two-dimensional (2D) position of the one or more points;
  
  determining stereo disparity for the 2D position of at least one of the one or more points in the first stereo image data;
  
  determining a three-dimensional (3D) position of the one or more points relative to the features of the object based at least in part on the stereo disparity and information associated with the two or more image capturing elements;
  
  generating third image data during a second period of time;
  
  generating fourth image data during the second period of time, the third image data and the fourth image data being parts of a stereo image pair;
  
  determining that the object was detected in the third image data;
  
  determining that the object was not detected in the fourth image data;
  
  generating a template of the object using image information for the object from the third image data; and
  
  using the template to detect the object in the fourth image data.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
- - 6. The computer-implemented method of claim 5, further comprising:
    - comparing the object detected in first image data of the first stereo image data to the object detected in second image data of the first stereo image data;
      
      determining the object detected in the first image data matches the object detected in the second image data; and
      
      associating the object from the first image data with the object from the second image data.
  - 7. The computer-implemented method of claim 6, further comprising:
    - associating the object detected in the first image data with the object detected in the third image data, the first image data and the third image data being captured by a first image capturing element; and
      
      associating the object detected in the second image data with the object detected in the fourth image data, the second image data and the fourth image data being captured by a second image capturing element.
  - 8. The computer-implemented method of claim 5, further comprising:
    - detecting, using a tracking-by-detection algorithm, the object in first image data of the first stereo image data captured during the first period of time, the tracking-by-detection algorithm returning data corresponding to a first position of the bounding box in the first image data; and
      
      detecting, in response to the tracking-by-detection algorithm failing to detect the object in the fourth image data generated during the second period of time, the object in the fourth image data using a Median Flow Tracking (MFT) algorithm, the MFT algorithm returning data corresponding to a second position of the bounding box in the fourth image data.
  - 9. The computer-implemented method of claim 5, further comprising:
    - determining the object is a user by;
      
      determining, based at least in part on the stereo disparity, a distance between a first point and a second point of the one or more points;
      
      determining the distance between the first point and the second point being between a first threshold distance and a second threshold distance associated with human eye distances; and
      
      determining the object is not the user by;
      
      determining the distance between the first point and the second point being at least one of less than the first threshold or greater than the second threshold.
  - 10. The computer-implemented method of claim 5, wherein tracking the one or more points in the first stereo image data to determine the 2D position for the one or more points includes:
    - determining, between the first period of time and a third period of time, a change in position of at least one of the one or more points in the first stereo image data;
      
      determining the change being less than a first threshold; and
      
      returning the 2D position for the one or more points corresponding to a position of the one or more points during the first period of time.
  - 11. The computer-implemented method of claim 5, wherein tracking the one or more points in the first stereo image data to determine the 2D position for the one or more points includes:
    - determining, between the first period of time and a third period of time, a change in position of at least one of the one or more points in the first stereo image data;
      
      determining the change being between a first threshold and a second threshold; and
      
      tracking, using a single point tracking algorithm, the one or more points in the first stereo image data between the first period of time and the third period of time to determine the 2D position.
  - 12. The computer-implemented method of claim 5, wherein tracking the one or more points in the first stereo image data to determine the 2D position for the one or more points includes:
    - determining, between the first period of time and a third period of time, a change in position of at least one of the one or more points in the first stereo image data;
      
      determining the change being greater than a second threshold; and
      
      returning the 2D position for the one or more points determined during the third period of time using a feature extraction algorithm.

13. A computing device, comprising:
- a processor;
  
  a first camera having a first field of view;
  
  a second camera having a second field of view at least partially overlapping the first field of view, the first camera and the second camera being parts of a camera assembly configured to capture three-dimensional image data;
  
  memory including instructions that, when executed by the processor, cause the computing device to;
  
  detect a representation of a face in first stereo image data generated during a first period of time by the first camera and the second camera;
  
  associate a bounding box with the representation of the face;
  
  analyze, using a feature extraction algorithm, the first stereo image data within the bounding box to determine one or more points of the representation of the face to track;
  
  track a two-dimensional (2D) position of the one or more points in the first stereo image data;
  
  determine stereo disparity for the 2D position of at least one of the one or more points in the first stereo image data;
  
  determine a z-depth for the one or more points using the stereo disparity and calibration information for the first camera and the second camera to determine a three-dimensional (3D) position for the one or more points;
  
  generate second stereo image data during a second period of time by the first camera and the second camera, the second stereo image data including at least first image data generated by the first camera, and second image data generated by the second camera;
  
  determine that the representation of the face is detected in the first image data;
  
  determine that the representation of the face is not detected in the second image data;
  
  generate a template of the face using image information for the face in the first image data; and
  
  use the template to detect the representation of the face in the second image data.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The computing device of claim 13, wherein tracking the 2D position of the one or more points includes:
    - determining, between the first period of time and a third period of time, a change in position of at least one of the one or more points in the first stereo image data;
      
      determining the change being less than a first threshold; and
      
      returning the 2D position for the one or more points corresponding to a position of the one or more points during the first period of time.
  - 15. The computing device of claim 13, wherein tracking the 2D position of the one or more points includes:
    - determining, between the first period of time and a third period of time, a change in position of at least one of the one or more points in the first stereo image data;
      
      determining the change being between a first threshold and a second threshold; and
      
      tracking, using a single point tracking algorithm, the one or more points in the first stereo image data between the first period of time and the third period of time to determine the 2D position.
  - 16. The computing device of claim 15, wherein the one or more points are at least one of a right eye and a left eye of the representation of the face, a midpoint of the right eye and the left eye, a right corner and a left corner of a mouth, or a midpoint of the mouth.
  - 17. The computing device of claim 13, wherein tracking the 2D position of the one or more points includes:
    - determining, between the first period of time and a third period of time, a change in position of at least one of the one or more points in the first stereo image data;
      
      determining the change being greater than a second threshold; and
      
      returning the 2D position for the one or more points determined during the third period of time using a feature extraction algorithm.
  - 18. The computing device of claim 13, wherein the instructions that, when executed by the processor, further cause the computing device to:
    - detect, using a tracking-by-detection algorithm, the representation of the face in first three-dimensional image data of the first stereo image data captured by the first camera and the second camera at the first time, the tracking-by-detection algorithm returning data corresponding to a first position of the bounding box in the first three-dimensional image data; and
      
      detect, in response to the tracking-by-detection algorithm failing to detect the representation of the face in the second image data, the representation of the face in the second image data using a Median Flow Tracking (MFT) algorithm, the MFT algorithm returning data corresponding to a second position of the bounding box in the second image data.
  - 19. The computing device of claim 13, wherein the instructions that, when executed by the processor, further cause the computing device to:
    - compare the representation of the face detected in first right image data of the first stereo image data captured by the first camera to the representation of the face detected in first left image data of the first stereo image data captured by the second camera;
      
      determine the representation of the face detected in the first right image data matches the representation of the face captured in the first left image data; and
      
      associate the representation of the face from the first right image data with the representation of the face from the first left image data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Thomas, Jim Oommen, Agrawal, Amit Kumar, Kuo, Cheng-Hao, Ma, Tianyang, Mehta, Sisil Sanjeev, Fu, Kah Kuen, Mangiat, Stephen Vincent, Tyagi, Ambrish, Ramaswamy, Sharadh
Primary Examiner(s)
Vu, Kim
Assistant Examiner(s)
Bloom, Nathan

Application Number

US14/307,492
Time in Patent Office

1,147 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 1/1686   the I/O peripheral being an...

G06F 1/3206   Monitoring of events, devic...

G06F 1/3231   Monitoring the presence, ab...

G06T 2207/10021   Stereoscopic video; Stereos...

G06V 20/52   Surveillance or monitoring ...

G06V 20/64   Three-dimensional objects

G06V 40/165   using facial parts and geom...

G06V 40/168   Feature extraction; Face re...

G06V 40/173   face re-identification, e.g...

H04N 13/204   using stereoscopic image ca...

H04N 13/239   using two 2D image sensors ...

H04N 13/271   wherein the generated image...

H04N 13/366   using viewer tracking

H04N 2013/0081   Depth or disparity estimati...

H04N 5/76   Television signal recording

Y02D 10/00   Energy efficient computing,...

Object detection and tracking

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Object detection and tracking

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links