VIDEO RETRIEVAL SYSTEM FOR HUMAN FACE CONTENT

US 20110170749A1
Filed: 12/20/2010
Published: 07/14/2011
Est. Priority Date: 09/29/2006
Status: Active Grant

First Claim

Patent Images

1. A method for processing video data, comprising:

detecting human faces in a plurality of video frames in said video data using a processor;

for at least one detected human face, identifying a face-specific set of video frames using said processor, irrespective of whether said detected human face is present in said face-specific set of video frames in a substantially temporally continuous manner;

grouping video frames in said face-specific set of video frames into a plurality of face tracks using said processor, wherein each face track contains corresponding one or more video frames having at least a substantial temporal continuity therebetween;

using said processor, merging two or more of said plurality of face tracks that are disjoint in time using a face recognition method based on a Bayesian Network based classifier; and

enabling a user to view on an electronic display face-specific video segments of said at least one detected human face in said video data based on said merging of temporally disjoint face tracks.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for video retrieval and cueing that automatically detects human faces in the video and identifies face-specific video frames so as to allow retrieval and viewing of person-specific video segments. In one embodiment, the method locates human faces in the video, stores the time stamps associated with each face, displays a single image associated with each face, matches each face against a database, computes face locations with respect to a common 3D coordinate system, and provides a means of displaying: 1) information retrieved from the database associated with a selected person or people, 2) path of travel associated with a selected person or people, 3) interaction graph of people in video, 4) video segments associated with each person and/or face. The method may also provide the ability to input and store text annotations associated with each person, face, and video segment, and the ability to enroll and remove people from database. The videos of non-human objects may be processed in a similar manner. Because of the rules governing abstracts, this abstract should not be used to construe the claims.

Citations

38 Claims

1. A method for processing video data, comprising:
- detecting human faces in a plurality of video frames in said video data using a processor;
  
  for at least one detected human face, identifying a face-specific set of video frames using said processor, irrespective of whether said detected human face is present in said face-specific set of video frames in a substantially temporally continuous manner;
  
  grouping video frames in said face-specific set of video frames into a plurality of face tracks using said processor, wherein each face track contains corresponding one or more video frames having at least a substantial temporal continuity therebetween;
  
  using said processor, merging two or more of said plurality of face tracks that are disjoint in time using a face recognition method based on a Bayesian Network based classifier; and
  
  enabling a user to view on an electronic display face-specific video segments of said at least one detected human face in said video data based on said merging of temporally disjoint face tracks.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The method of claim 1, wherein said grouping is carried out in a temporally sequential manner based on respective time stamps associated with said video frames in each said face-specific set of video frames.
  - 3. The method of claim 1, further comprising:
    - displaying a representative image for said grouped video frames on said electronic display.
  - 4. The method of claim 1, further comprising:
    - allowing said user to manually associate respective grouped video frames in said face-specific set of video frames with an image entry stored in a database using said processor.
  - 5. The method of claim 1, further comprising:
    - allowing said user to manually override a match between respective grouped video frames in said face-specific set of video frames and an image entry stored in a database using said processor.
  - 6. The method of claim 1, further comprising:
    - matching grouped video frames with image entries stored in a database using said processor; and
      
      using said processor, enrolling unmatched grouped video frames into said database through corresponding image entries.
  - 7. The method of claim 1, further comprising:
    - using said processor, indicating one or more unmatched human faces in said detected human faces based on a comparison of said detected human faces against a plurality of human face images stored in a database; and
      
      enabling said user to view on said electronic display those face-specific video segments wherein said one or more unmatched human faces are present.
  - 8. The method of claim 1, further comprising:
    - displaying on said electronic display a representative image for at least one video frame in said face-specific set of video frames for said at least one detected human face.
  - 9. The method of claim 8, further comprising:
    - enabling said user to view said face-specific video segments on said electronic display using said representative image as a link therefor.
  - 10. The method of claim 8, further comprising:
    - retrieving a textual description for said face-specific video segments from a database using said processor; and
      
      displaying said textual description along with said representative image on said electronic display.
  - 11. The method of claim 1, further comprising:
    - enabling said user to input a textual description of said face-specific video segments associated with said at least one detected human face using said processor.
  - 12. The method of claim 1, wherein said identifying includes using face recognition to identify said face-specific set of video frames for said at least one detected human face.
  - 13. The method of claim 1, further comprising:
    - automatically displaying said face-specific video segments on said electronic display upon identification of said face-specific set of video frames for said at least one detected human face.
  - 14. The method of claim 1, further comprising:
    - using said processor, determining movement of said at least one detected human face in said face-specific video segments associated therewith using a three-dimensional coordinate system.
  - 15. The method of claim 14, further comprising:
    - displaying said movement of said at least one detected human face with respect to a map on said electronic display.
  - 16. The method of claim 1, further comprising:
    - displaying a co-occurrence of two human faces in said plurality of video frames as a link graph on said electronic display, wherein said link graph includes a plurality of nodes, and wherein each node in said link graph represents a different detected human face in said plurality of video frames regardless of identification status of said detected human face.
  - 17. The method of claim 16, wherein said link graph includes a plurality of dimensionally-weighted links, wherein each link connects a pair of nodes from said plurality of nodes, and wherein weighting of each said link is proportional to the amount of interaction between two humans represented as nodes connected by said link.

18. A method for processing video data, comprising:
- detecting human faces in a plurality of video frames in said video data using a processor;
  
  indicating, using said processor, one or more unmatched human faces in said detected human faces based on a comparison of said detected human faces against a plurality of human face images stored in a database; and
  
  using said processor, tracking at least one unmatched human face across said video data by locating a face-specific set of video frames therefor using a face recognition method based on a Bayesian Network based classifier, irrespective of whether said unmatched human face is present in said face-specific set of video frames in a substantially temporally continuous manner.
- View Dependent Claims (19, 20, 21)
- - 19. The method of claim 18, wherein said tracking is performed in real time.
  - 20. The method of claim 18, further comprising:
    - using said processor, automatically displaying face-specific video segments associated with said at least one unmatched human face based on said face-specific set of video frames located therefor.
  - 21. The method of claim 18, further comprising:
    - grouping all video frames in said face-specific set of video frames located for said at least one unmatched human face using said processor; and
      
      displaying a representative image for at least one video frame in said face-specific set of video frames using said processor.

22. A method for processing video data, comprising:
- detecting objects in a plurality of video frames in said video data using a processor;
  
  for at least one detected object, identifying an object-specific set of video frames using said processor, irrespective of whether said detected object is present in said object-specific set of video frames in a substantially temporally continuous manner;
  
  grouping video frames in said object-specific set of video frames into a plurality of object tracks using said processor, wherein each object track contains corresponding one or more video frames having at least a substantial temporal continuity therebetween;
  
  using said processor, merging two or more of said plurality of object tracks that are disjoint in time using an object recognition method based on a Bayesian Network based classifier; and
  
  enabling a user to view on an electronic display for said processor object-specific video segments of said at least one detected object in said video data based on said merging of temporally disjoint object tracks.

23. A method, comprising:
- receiving video data from a user over a data communication network using a processor;
  
  detecting human faces in a plurality of video frames in said video data using said processor;
  
  for at least one detected human face, identifying a face-specific set of video frames using said processor, irrespective of whether said detected human face is present in said face-specific set of video frames in a substantially temporally continuous manner;
  
  configuring said processor to use a face recognition method based on a Bayesian Network based classifier to identify those portions of said video data corresponding to said face-specific set of video frames wherein said at least one detected human face is present; and
  
  using said processor, sending cueing information for said portions of said video data to said user over said data communication network so as to enable said user to selectively view face-specific video segments in said video data associated with said at least one detected human face without a need to search said video data for said video segments.
- View Dependent Claims (24, 25, 26, 27)
- - 24. The method of claim 23, further comprising:
    - indicating, using said processor, one or more unmatched human faces in said detected human faces;
      
      using said processor, identifying only those portions of said video data wherein said one or more unmatched human faces are present; and
      
      sending cueing information for only said video portions associated with said one or more unmatched human faces to said user over said data communication network using said processor.
  - 25. The method of claim 23, wherein said cueing information includes said face-specific video segments associated with only those of said detected human faces that are unmatched based on a database query.
  - 26. The method of claim 23, further comprising:
    - charging a fee to said user for sending said cueing information.
  - 27. The method of claim 23, wherein said data communication network is the Internet.

28. A data storage medium containing a program code, which, when executed by a processor, causes said processor to perform the following:
- receive video data;
  
  detect human faces in a plurality of video frames in said video data;
  
  for at least one detected human face, identify a face-specific set of video frames irrespective of whether said detected human face is present in said face-specific set of video frames in a substantially temporally continuous manner;
  
  group all video frames in said face-specific set of video frames into a plurality of face tracks, wherein each face track contains corresponding one or more video frames having at least a substantial temporal continuity therebetween;
  
  merge two or more of said plurality of face tracks that are disjoint in time using a face recognition method based on a Bayesian Network based classifier; and
  
  enable a user to view face-specific video segments of said at least one detected human face in said video data based on said merger of temporally disjoint face tracks.
- View Dependent Claims (29, 30, 31)
- - 29. The data storage medium of claim 28, wherein said program code, upon execution by said processor, causes said processor to further perform the following:
    - indicate one or more unmatched human faces in said detected human faces based on a comparison of said detected human faces against a plurality of human face images stored in a database; and
      
      track at least one unmatched human face across said video data in substantially real time through said face-specific set of video frames therefor.
  - 30. The data storage medium of claim 29, wherein said program code, upon execution by said processor, causes said processor to further perform the following:
    - automatically display face-specific video segments associated with said at least one unmatched human face based on said face-specific set of video frames therefor.
  - 31. The data storage medium of claim 29, wherein said program code, upon execution by said processor, causes said processor to further perform the following:
    - display a cueing link for said face-specific set of video frames associated with said at least one unmatched human face so as to enable said user to view only those face-specific video segments in said video data wherein said at least one unmatched human face appears without requiring said user to search said video data for said video segments of said at least one unmatched human face.

32. A system for processing video data, comprising:
- means for detecting human faces in a plurality of video frames in said video data;
  
  for at least one detected human face, means for identifying a face-specific set of video frames irrespective of whether said detected human face is present in said face-specific set of video frames in a substantially temporally continuous manner;
  
  means for grouping all video frames in said face-specific set of video frames into a plurality of face tracks, wherein each face track contains corresponding one or more video frames having at least a substantial temporal continuity therebetween;
  
  means for merging two or more of said plurality of face tracks that are disjoint in time using a face recognition method based on a Bayesian Network based classifier;
  
  andmeans for displaying face-specific video segments of said at least one detected human face in said video data based on said merger of temporally disjoint face tracks.
- View Dependent Claims (33)
- - 33. The system of claim 32, further comprising:
    - means for indicating one or more unmatched human faces in said detected human faces;
      
      means for identifying those portions of said video data wherein said one or more unmatched human faces are present; and
      
      means for automatically displaying face-specific video segments in said video data associated with said one or more unmatched human faces based on said video data portions identified for said one or more unmatched human faces.

34. A computer system, which, upon being programmed, is configured to perform the following:
- receive video data;
  
  detect human faces in a plurality of video frames in said video data;
  
  for at least one detected human face, identify a face-specific set of video frames irrespective of whether said detected human face is present in said face-specific set of video frames in a substantially temporally continuous manner;
  
  group all video frames in said face-specific set of video frames into a plurality of face tracks, wherein each face track contains corresponding one or more video frames having at least a substantial temporal continuity therebetween;
  
  merge two or more of said plurality of face tracks that are disjoint in time using a face recognition method based on a Bayesian Network based classifier; and
  
  enable a user to view face-specific video segments of said at least one detected human face in said video data based on said merger of temporally disjoint face tracks.

35. A system for processing video data, comprising:
- a computing unit; and
  
  a data storage medium containing a program code, which, when executed by said computing unit, causes said computing unit to perform the following;
  
  receive video data;
  
  detect human faces in a plurality of video frames in said video data;
  
  for at least one detected human face, identify a face-specific set of video frames irrespective of whether said detected human face is present in said face-specific set of video frames in a substantially temporally continuous manner; and
  
  use a face recognition method based on a Bayesian Network based classifier to enable a user to view face-specific video segments in said video data based on said face-specific set of video frames identified.
- View Dependent Claims (36)
- - 36. The system of claim 35, further comprising:
    - a video data source to provide said video data, wherein said video data source is one of the following;
      
      a portion of said computing unit configured to record said video data; and
      
      a video camera coupled to said computing unit.

37. A system for processing video data, comprising:
- a video data source connected to a communication network, wherein said video data source is configured to transmit video data over said communication network; and
  
  a computing unit in communication with said video data source and connected to said communication network, wherein said computing unit is configured to perform the following;
  
  receive said video data from said video data source transmitted over said communication network;
  
  detect human faces in a plurality of video frames in said video data;
  
  for at least one detected human face, identify a face-specific set of video frames irrespective of whether said detected human face is present in said face-specific set of video frames in a substantially temporally continuous manner; and
  
  use a face recognition method based on a Bayesian Network based classifier to send cueing information for said face-specific set of video frames to said user over said data communication network so as to enable said user to selectively view face-specific video segments in said video data associated with said at least one detected human face without a need to search said video data for said video segments.
- View Dependent Claims (38)
- - 38. The system of claim 37, wherein said video data source is at least one of the following:
    - a computing unit having a built-in means to record said video data;
      
      a video camera; and
      
      a computing unit having said video data stored therein prior to transmission over said communication network.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Pittsburgh Pattern Recognition, Inc. (Alphabet Inc.)
Inventors
Rodriguez, Uriel G., Brandy, Louis D., Nechyba, Michael C., Schneiderman, Henry

Granted Patent

US 8,401,252 B2
Time in Patent Office

Days
Field of Search
US Class Current

382/118
CPC Class Codes

G06F 16/784   the detected or recognised ...

G06V 40/173   face re-identification, e.g...

G08B 13/196   using television cameras

G11B 27/105   of operating discs

G11B 27/28   by using information signal...

G11B 27/3027   used signal is digitally coded

G11B 27/34   Indicating arrangements in...

VIDEO RETRIEVAL SYSTEM FOR HUMAN FACE CONTENT

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

38 Claims

Specification

Solutions

Use Cases

Quick Links

VIDEO RETRIEVAL SYSTEM FOR HUMAN FACE CONTENT

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

38 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links