Object detection and tracking
First Claim
1. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause a computing device to:
- generate first right image data during a first period of time with a right camera;
generate first left image data during the first period of time with a left camera, the right camera and the left camera having at least partially overlapping fields of view;
detect, using a face detection algorithm, a first right representation of a face in the first right image data, the face detection algorithm returning a first right bounding box for the face in the first right image data;
detect, using the face detection algorithm, a first left representation of the face in the first left image data, the face detection algorithm returning a first left bounding box for the face in the first left image data;
detect, using a feature extraction algorithm, a set of features of the face in the first right image data by analyzing the first right image data within the first right bounding box;
detect, using the feature extraction algorithm, the set of features of the face in the first left image data by analyzing the first left image data within the first left bounding box;
determine a first right position for a point relative to the set of features in the first right image data;
determine a first left position for the point relative to the set of features in the first left image data;
generate second right image data during a second period of time with the right camera;
generate second left image data during the second period of time with the left camera;
detect, using the face detection algorithm, a second right representation of the face in the second right image data, the face detection algorithm returning a second right bounding box for the face in the second right image data;
detect, using the face detection algorithm, a second left representation of the face in the second left image data, the face detection algorithm returning a second left bounding box for the face in the second left image data;
detect, using the feature extraction algorithm, the set of features of the face in the second right image data by analyzing the second right image data within the second right bounding box;
detect, using the feature extraction algorithm, the set of features of the face in the first left image data by analyzing the second left image data within the second left bounding box;
determine a second right position for the point relative to the set of features in the second right image data;
determine a second left position for the point relative to the set of features in the second left image data;
determine a right two-dimensional (2D) change in position of the point relative to the set of features between the first right image data and the second right image data;
determine a left 2D change in position of the point relative to the set of features between the first left image data and the second left image data;
determine, using a feature tracking algorithm and based at least in part on the right 2D change in position, a right 2D output for the point relative to the set of features;
determine, using the feature tracking algorithm and based at least in part on the left 2D change in position, a left 2D output for the point relative to the set of features;
determine stereo disparity of the point relative to the set of features between the right 2D output and the left 2D output in the at least partially overlapping fields of view of the right camera and the left camera;
determine a z-depth for the point relative to the set of features of the face using the stereo disparity and calibration information for the right camera and the left camera to determine a three-dimensional (3D) position for the point relative to the set of features;
generate third right image data during a third period of time with the right camera;
generate third left image data during the third period of time with the left camera;
determine that a third right representation of the face is detected in the third right image data;
determine that a third left representation of the face is not detected in the third left image data;
generate a template of the face using information for the face from the third right image data; and
use the template to detect the face in the third left image data.
1 Assignment
0 Petitions
Accused Products
Abstract
Various embodiments enable a primary user to be identified and tracked using stereo association and multiple tracking algorithms. For example, a face detection algorithm can be run on each image captured by a respective camera independently. Stereo association can be performed to match faces between cameras. If the faces are matched and a primary user is determined, a face pair is created and used as the first data point in memory for initializing object tracking. Further, features of a user'"'"'s face can be extracted and the change in position of these features between images can determine what tracking method will be used for that particular frame.
-
Citations
19 Claims
-
1. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause a computing device to:
-
generate first right image data during a first period of time with a right camera; generate first left image data during the first period of time with a left camera, the right camera and the left camera having at least partially overlapping fields of view; detect, using a face detection algorithm, a first right representation of a face in the first right image data, the face detection algorithm returning a first right bounding box for the face in the first right image data; detect, using the face detection algorithm, a first left representation of the face in the first left image data, the face detection algorithm returning a first left bounding box for the face in the first left image data; detect, using a feature extraction algorithm, a set of features of the face in the first right image data by analyzing the first right image data within the first right bounding box; detect, using the feature extraction algorithm, the set of features of the face in the first left image data by analyzing the first left image data within the first left bounding box; determine a first right position for a point relative to the set of features in the first right image data; determine a first left position for the point relative to the set of features in the first left image data; generate second right image data during a second period of time with the right camera; generate second left image data during the second period of time with the left camera; detect, using the face detection algorithm, a second right representation of the face in the second right image data, the face detection algorithm returning a second right bounding box for the face in the second right image data; detect, using the face detection algorithm, a second left representation of the face in the second left image data, the face detection algorithm returning a second left bounding box for the face in the second left image data; detect, using the feature extraction algorithm, the set of features of the face in the second right image data by analyzing the second right image data within the second right bounding box; detect, using the feature extraction algorithm, the set of features of the face in the first left image data by analyzing the second left image data within the second left bounding box; determine a second right position for the point relative to the set of features in the second right image data; determine a second left position for the point relative to the set of features in the second left image data; determine a right two-dimensional (2D) change in position of the point relative to the set of features between the first right image data and the second right image data; determine a left 2D change in position of the point relative to the set of features between the first left image data and the second left image data; determine, using a feature tracking algorithm and based at least in part on the right 2D change in position, a right 2D output for the point relative to the set of features; determine, using the feature tracking algorithm and based at least in part on the left 2D change in position, a left 2D output for the point relative to the set of features; determine stereo disparity of the point relative to the set of features between the right 2D output and the left 2D output in the at least partially overlapping fields of view of the right camera and the left camera; determine a z-depth for the point relative to the set of features of the face using the stereo disparity and calibration information for the right camera and the left camera to determine a three-dimensional (3D) position for the point relative to the set of features; generate third right image data during a third period of time with the right camera; generate third left image data during the third period of time with the left camera; determine that a third right representation of the face is detected in the third right image data; determine that a third left representation of the face is not detected in the third left image data; generate a template of the face using information for the face from the third right image data; and use the template to detect the face in the third left image data. - View Dependent Claims (2, 3, 4)
-
-
5. A computer-implemented method, comprising:
-
detecting an object in first stereo image data generated during a first period of time by two or more image capturing elements with overlapping fields of view; associating a bounding box with the object; analyzing the first stereo image data within the bounding box to determine one or more points relative to features of the object; tracking the one or more points in the first stereo image data to determine a two-dimensional (2D) position of the one or more points; determining stereo disparity for the 2D position of at least one of the one or more points in the first stereo image data; determining a three-dimensional (3D) position of the one or more points relative to the features of the object based at least in part on the stereo disparity and information associated with the two or more image capturing elements; generating third image data during a second period of time; generating fourth image data during the second period of time, the third image data and the fourth image data being parts of a stereo image pair; determining that the object was detected in the third image data; determining that the object was not detected in the fourth image data; generating a template of the object using image information for the object from the third image data; and using the template to detect the object in the fourth image data. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
-
-
13. A computing device, comprising:
-
a processor; a first camera having a first field of view; a second camera having a second field of view at least partially overlapping the first field of view, the first camera and the second camera being parts of a camera assembly configured to capture three-dimensional image data; memory including instructions that, when executed by the processor, cause the computing device to; detect a representation of a face in first stereo image data generated during a first period of time by the first camera and the second camera; associate a bounding box with the representation of the face; analyze, using a feature extraction algorithm, the first stereo image data within the bounding box to determine one or more points of the representation of the face to track; track a two-dimensional (2D) position of the one or more points in the first stereo image data; determine stereo disparity for the 2D position of at least one of the one or more points in the first stereo image data; determine a z-depth for the one or more points using the stereo disparity and calibration information for the first camera and the second camera to determine a three-dimensional (3D) position for the one or more points; generate second stereo image data during a second period of time by the first camera and the second camera, the second stereo image data including at least first image data generated by the first camera, and second image data generated by the second camera; determine that the representation of the face is detected in the first image data; determine that the representation of the face is not detected in the second image data; generate a template of the face using image information for the face in the first image data; and use the template to detect the representation of the face in the second image data. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
Specification