Method and apparatus for real-time face-tracking and face-pose-selection on embedded vision systems

US 10,510,157 B2
Filed: 10/28/2017
Issued: 12/17/2019
Est. Priority Date: 10/28/2017
Status: Active Grant

First Claim

Patent Images

1. A method for performing real-time face-pose-estimation and best-pose selection for a detected person captured in a video, the method comprising:

receiving a video image among a sequence of video frames of a video;

performing a face detection operation on the video image to detect a set of faces in the video image;

detecting a new person appears in the video based on the set of detected faces;

tracking the new person through subsequent video images in the video by detecting a sequence of face images of the new person in the subsequent video images;

for each of the subsequent video images which contains a detected face of the new person being tracked;

estimating a pose associated with the detected face; and

updating a best pose for the new person based on the estimated pose; and

upon detecting that the new person has disappeared from the video, transmitting a detected face of the new person corresponding to the current best pose to a server, wherein transmitting the detected face having the best pose among the sequence of detected face images reduces network bandwidth and improves storage efficiency.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments described herein provide various examples of a real-time face-detection, face-tracking, and face-pose-selection subsystem within an embedded video system. In one aspect, a process for performing real-time face-pose-estimation and best-pose selection for a detected person captured in a video is disclosed. This process includes the steps of: receiving a video image among a sequence of video frames of a video; performing a face detection operation on the video image to detect a set of faces in the video image; detecting a new person appears in the video based on the set of detected faces; tracking the new person through subsequent video images in the video by detecting a sequence of face images of the new person in the subsequent video images; and for each of the subsequent video images which contains a detected face of the new person being tracked: estimating a pose associated with the detected face and updating a best pose for the new person based on the estimated pose. Upon detecting that the new person has disappeared from the video, the process then transmits a detected face of the new person corresponding to the current best pose to a server, wherein transmitting the detected face having the best pose among the sequence of detected face images reduces network bandwidth and improves storage efficiency.

Citations

20 Claims

1. A method for performing real-time face-pose-estimation and best-pose selection for a detected person captured in a video, the method comprising:
- receiving a video image among a sequence of video frames of a video;
  
  performing a face detection operation on the video image to detect a set of faces in the video image;
  
  detecting a new person appears in the video based on the set of detected faces;
  
  tracking the new person through subsequent video images in the video by detecting a sequence of face images of the new person in the subsequent video images;
  
  for each of the subsequent video images which contains a detected face of the new person being tracked;
  
  estimating a pose associated with the detected face; and
  
  updating a best pose for the new person based on the estimated pose; and
  
  upon detecting that the new person has disappeared from the video, transmitting a detected face of the new person corresponding to the current best pose to a server, wherein transmitting the detected face having the best pose among the sequence of detected face images reduces network bandwidth and improves storage efficiency.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The method of claim 1, wherein after updating the best pose, the method further comprises:
    - comparing the updated best pose to a threshold value which represents a face pose sufficiently good for face recognition; and
      
      if the updated best pose meets the threshold value, transmitting the detected face of the new person corresponding to the updated best pose to the server without waiting for the new person to disappear from the video.
  - 3. The method of claim 2, wherein prior to transmitting the detected face of the new person corresponding to the current best pose to the server, the method further comprises:
    - determining if a detected face of the new person determined to be sufficiently good has been previously sent to the server; and
      
      if so, avoiding transmitting the detected face of the new person corresponding to the current best pose to the server.
  - 4. The method of claim 1, wherein detecting that the new person has disappeared from the video further comprises:
    - determining that the new person does not have a corresponding detected face image in a newly processed video image;
      
      detecting if the new person has a corresponding face image at a location in the newly processed video image which is the same as the location of a detected face image of the new person in a preceding video frame; and
      
      if so, determining that the new person has become stationary;
      
      otherwise, determining that the new person has disappeared from the video.
  - 5. The method of claim 1, wherein upon determining that the new person has become stationary, the method further comprises continuing monitoring the new person through subsequent video images until the new person starts moving again.
  - 6. The method of claim 1, wherein performing the face detection operation on the video image to detect a set of faces in the video image includes:
    - identifying a set of moving areas within the video image; and
      
      for each moving area in the set of identified moving areas, applying a neural network based face detection technique to the moving area to detect one or more human faces within the moving area.
  - 7. The method of claim 1, wherein detecting the new person appears in the video based on the set of detected faces includes:
    - performing a face association operation between a set of labeled detected faces in a first processed video image and a set of unlabeled detected faces in a second processed video image immediately succeeding the first processed video image; and
      
      identifying each of the set of unlabeled detected faces not associated with any of the set of labeled detected faces as a new person.
  - 8. The method of claim 1, wherein tracking the new person through the subsequent video images includes performing a direct face association operation between a labeled detected face of the new person in a first processed video image and an unlabeled detected face of the new person in a second processed video image following the first processed video image.
  - 9. The method of claim 8, wherein a first location of the labeled detected face of the new person in the first processed video image is different from a second location of the unlabeled detected face of the new person in the second processed video image due to a movement of the new person.
  - 10. The method of claim 8, wherein the bounding box of the labeled detected face of the new person in the first processed video image and the bounding box of the unlabeled detected face of the new person in the second processed video image overlap each other.
  - 11. The method of claim 1, wherein tracking the new person through subsequent video images in the video involves:
    - locating the bounding box of the detected face of the new person in the processed video image and using the bounding box as a reference box and the detected face image within the bounding box as a search block;
      
      placing a search window of a predetermined size around the same location as the location of the bounding box in a unprocessed video frame succeeding the processed video frame, wherein the search window contains a plurality of search locations; and
      
      at each of the plurality of search locations within the search window,placing the reference box at the search location; and
      
      comparing the search block with the image patch within the place reference box.
  - 12. The method of claim 1, wherein tracking the new person through subsequent video images in the video involves:
    - locating the bounding box of the detected face of the new person in the processed video image and using the bounding box as a reference box and the detected face image within the bounding box as a search block;
      
      predicting a location for the face of the new person in an unprocessed video frame succeeding the processed video frame based on the location of the bounding box of the detected face in the processed video image and a predicted movement of the new person;
      
      placing a search window of a predetermined size around the predicted location in the unprocessed video frame, wherein the search window contains a plurality of search locations; and
      
      at each of the plurality of search locations within the search window,placing the reference box at the search location; and
      
      comparing the search block with the image patch within the place reference box.
  - 13. The method of claim 12, wherein the predicted movement of the new person is determined based on two or more detected locations of two or more detected faces of the new person from two or more processed video frames preceding the unprocessed video frame.
  - 14. The method of claim 12, wherein the method further comprises predicting the movement of the new person using either a linear prediction or a non-linear prediction based on a Kalman filter.
  - 15. The method of claim 1, wherein estimating the pose associated with the detected face includes performing a joint face-detection and pose-estimation on each of the subsequent video images based on using a convolutional neutral network (CNN).
  - 16. The method of claim 1, wherein the pose estimation associated with the detected face includes three head-pose angles associated with the detected face.
  - 17. The method of claim 1, wherein the best pose for the new person is a head-pose associated with the smallest overall rotation from a frontal orientation.

18. A system for performing real-time face-pose-estimation and best-pose selection for a detected person captured in a video, the system comprising:
- a receiving module configured to receive a video image among a sequence of video frames of a video;
  
  a face detection module configured to;
  
  detect a face detection operation on the video image to detect a set of faces in the video image; and
  
  detect a new person appears in the video based on the set of detected faces;
  
  a face tracking module configured to track the new person through subsequent video images in the video by detecting a sequence of face images of the new person in the subsequent video images; and
  
  a face-pose-selection module configured to, for each of the subsequent video images which contains a detected face of the new person being tracked;
  
  estimate a pose associated with the detected face;
  
  update a best pose for the new person based on the estimated pose; and
  
  upon detecting that the new person has disappeared from the video, transmit a detected face of the new person corresponding to the current best pose to a server, wherein transmitting the detected face having the best pose among the sequence of detected face images reduces network bandwidth and improves storage efficiency.

19. An embedded system capable of performing real-time face-pose-estimation and best-pose selection for a detected person captured in a video, the embedded system comprising:
- a processor;
  
  a memory coupled to the processor;
  
  an image capturing device coupled to the processor and the memory and configured to capture a video;
  
  a receiving module configured to receive a video image among a sequence of video frames of a video;
  
  a face detection module configured to;
  
  detect a face detection operation on the video image to detect a set of faces in the video image; and
  
  detect a new person appears in the video based on the set of detected faces;
  
  a face tracking module configured to track the new person through subsequent video images in the video by detecting a sequence of face images of the new person in the subsequent video images; and
  
  a face-pose-selection module configured to, for each of the subsequent video images which contains a detected face of the new person being tracked;
  
  estimate a pose associated with the detected face;
  
  update a best pose for the new person based on the estimated pose; and
  
  upon detecting that the new person has disappeared from the video, transmit a detected face of the new person corresponding to the current best pose to a server, wherein transmitting the detected face having the best pose among the sequence of detected face images reduces network bandwidth and improves storage efficiency.
- View Dependent Claims (20)
- - 20. The embedded system of claim 19, wherein the embedded system is one of a surveillance camera system, a machine vision system, a drone system, a robotic system, a self-driving vehicle, or a mobile device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
AltumView Systems Inc.
Original Assignee
AltumView Systems Inc.
Inventors
Seyfi, Mehdi, Wang, Xing, Chen, Minghua, Wang, Kaichao, Wang, Weiming, Ng, Him Wai, Zheng, Jiannan, Liang, Jie
Primary Examiner(s)
Fujita, Katrina R

Application Number

US15/796,798
Publication Number

US 20190130594A1
Time in Patent Office

780 Days
Field of Search
US Class Current
CPC Class Codes

G06T 2207/10016   Video; Image sequence

G06T 2207/20084   Artificial neural networks ...

G06T 2207/30201   Face

G06T 2210/12   Bounding box

G06T 7/223   using block-matching

G06T 7/248   involving reference images ...

G06T 7/277   involving stochastic approa...

G06T 7/70   Determining position or ori...

G06V 10/454   Integrating the filters int...

G06V 10/764   using classification, e.g. ...

G06V 20/30   in albums, collections or s...

G06V 20/46   Extracting features or char...

G06V 40/161   Detection; Localisation; No...

G06V 40/167   using comparisons between t...

G06V 40/172   Classification, e.g. identi...

Method and apparatus for real-time face-tracking and face-pose-selection on embedded vision systems

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for real-time face-tracking and face-pose-selection on embedded vision systems

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links