VIDEO RETRIEVAL SYSTEM FOR HUMAN FACE CONTENT
First Claim
1. A method for processing video data, comprising:
- detecting human faces in a plurality of video frames in said video data using a processor;
for at least one detected human face, identifying a face-specific set of video frames using said processor, irrespective of whether said detected human face is present in said face-specific set of video frames in a substantially temporally continuous manner;
grouping video frames in said face-specific set of video frames into a plurality of face tracks using said processor, wherein each face track contains corresponding one or more video frames having at least a substantial temporal continuity therebetween;
using said processor, merging two or more of said plurality of face tracks that are disjoint in time using a face recognition method based on a Bayesian Network based classifier; and
enabling a user to view on an electronic display face-specific video segments of said at least one detected human face in said video data based on said merging of temporally disjoint face tracks.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for video retrieval and cueing that automatically detects human faces in the video and identifies face-specific video frames so as to allow retrieval and viewing of person-specific video segments. In one embodiment, the method locates human faces in the video, stores the time stamps associated with each face, displays a single image associated with each face, matches each face against a database, computes face locations with respect to a common 3D coordinate system, and provides a means of displaying: 1) information retrieved from the database associated with a selected person or people, 2) path of travel associated with a selected person or people, 3) interaction graph of people in video, 4) video segments associated with each person and/or face. The method may also provide the ability to input and store text annotations associated with each person, face, and video segment, and the ability to enroll and remove people from database. The videos of non-human objects may be processed in a similar manner. Because of the rules governing abstracts, this abstract should not be used to construe the claims.
-
Citations
38 Claims
-
1. A method for processing video data, comprising:
-
detecting human faces in a plurality of video frames in said video data using a processor; for at least one detected human face, identifying a face-specific set of video frames using said processor, irrespective of whether said detected human face is present in said face-specific set of video frames in a substantially temporally continuous manner; grouping video frames in said face-specific set of video frames into a plurality of face tracks using said processor, wherein each face track contains corresponding one or more video frames having at least a substantial temporal continuity therebetween; using said processor, merging two or more of said plurality of face tracks that are disjoint in time using a face recognition method based on a Bayesian Network based classifier; and enabling a user to view on an electronic display face-specific video segments of said at least one detected human face in said video data based on said merging of temporally disjoint face tracks. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A method for processing video data, comprising:
-
detecting human faces in a plurality of video frames in said video data using a processor; indicating, using said processor, one or more unmatched human faces in said detected human faces based on a comparison of said detected human faces against a plurality of human face images stored in a database; and using said processor, tracking at least one unmatched human face across said video data by locating a face-specific set of video frames therefor using a face recognition method based on a Bayesian Network based classifier, irrespective of whether said unmatched human face is present in said face-specific set of video frames in a substantially temporally continuous manner. - View Dependent Claims (19, 20, 21)
-
-
22. A method for processing video data, comprising:
-
detecting objects in a plurality of video frames in said video data using a processor; for at least one detected object, identifying an object-specific set of video frames using said processor, irrespective of whether said detected object is present in said object-specific set of video frames in a substantially temporally continuous manner; grouping video frames in said object-specific set of video frames into a plurality of object tracks using said processor, wherein each object track contains corresponding one or more video frames having at least a substantial temporal continuity therebetween; using said processor, merging two or more of said plurality of object tracks that are disjoint in time using an object recognition method based on a Bayesian Network based classifier; and enabling a user to view on an electronic display for said processor object-specific video segments of said at least one detected object in said video data based on said merging of temporally disjoint object tracks.
-
-
23. A method, comprising:
-
receiving video data from a user over a data communication network using a processor; detecting human faces in a plurality of video frames in said video data using said processor; for at least one detected human face, identifying a face-specific set of video frames using said processor, irrespective of whether said detected human face is present in said face-specific set of video frames in a substantially temporally continuous manner; configuring said processor to use a face recognition method based on a Bayesian Network based classifier to identify those portions of said video data corresponding to said face-specific set of video frames wherein said at least one detected human face is present; and using said processor, sending cueing information for said portions of said video data to said user over said data communication network so as to enable said user to selectively view face-specific video segments in said video data associated with said at least one detected human face without a need to search said video data for said video segments. - View Dependent Claims (24, 25, 26, 27)
-
-
28. A data storage medium containing a program code, which, when executed by a processor, causes said processor to perform the following:
-
receive video data; detect human faces in a plurality of video frames in said video data; for at least one detected human face, identify a face-specific set of video frames irrespective of whether said detected human face is present in said face-specific set of video frames in a substantially temporally continuous manner; group all video frames in said face-specific set of video frames into a plurality of face tracks, wherein each face track contains corresponding one or more video frames having at least a substantial temporal continuity therebetween; merge two or more of said plurality of face tracks that are disjoint in time using a face recognition method based on a Bayesian Network based classifier; and enable a user to view face-specific video segments of said at least one detected human face in said video data based on said merger of temporally disjoint face tracks. - View Dependent Claims (29, 30, 31)
-
-
32. A system for processing video data, comprising:
-
means for detecting human faces in a plurality of video frames in said video data; for at least one detected human face, means for identifying a face-specific set of video frames irrespective of whether said detected human face is present in said face-specific set of video frames in a substantially temporally continuous manner; means for grouping all video frames in said face-specific set of video frames into a plurality of face tracks, wherein each face track contains corresponding one or more video frames having at least a substantial temporal continuity therebetween; means for merging two or more of said plurality of face tracks that are disjoint in time using a face recognition method based on a Bayesian Network based classifier; and means for displaying face-specific video segments of said at least one detected human face in said video data based on said merger of temporally disjoint face tracks. - View Dependent Claims (33)
-
-
34. A computer system, which, upon being programmed, is configured to perform the following:
-
receive video data; detect human faces in a plurality of video frames in said video data; for at least one detected human face, identify a face-specific set of video frames irrespective of whether said detected human face is present in said face-specific set of video frames in a substantially temporally continuous manner; group all video frames in said face-specific set of video frames into a plurality of face tracks, wherein each face track contains corresponding one or more video frames having at least a substantial temporal continuity therebetween; merge two or more of said plurality of face tracks that are disjoint in time using a face recognition method based on a Bayesian Network based classifier; and enable a user to view face-specific video segments of said at least one detected human face in said video data based on said merger of temporally disjoint face tracks.
-
-
35. A system for processing video data, comprising:
-
a computing unit; and a data storage medium containing a program code, which, when executed by said computing unit, causes said computing unit to perform the following; receive video data; detect human faces in a plurality of video frames in said video data; for at least one detected human face, identify a face-specific set of video frames irrespective of whether said detected human face is present in said face-specific set of video frames in a substantially temporally continuous manner; and use a face recognition method based on a Bayesian Network based classifier to enable a user to view face-specific video segments in said video data based on said face-specific set of video frames identified. - View Dependent Claims (36)
-
-
37. A system for processing video data, comprising:
-
a video data source connected to a communication network, wherein said video data source is configured to transmit video data over said communication network; and a computing unit in communication with said video data source and connected to said communication network, wherein said computing unit is configured to perform the following; receive said video data from said video data source transmitted over said communication network; detect human faces in a plurality of video frames in said video data; for at least one detected human face, identify a face-specific set of video frames irrespective of whether said detected human face is present in said face-specific set of video frames in a substantially temporally continuous manner; and use a face recognition method based on a Bayesian Network based classifier to send cueing information for said face-specific set of video frames to said user over said data communication network so as to enable said user to selectively view face-specific video segments in said video data associated with said at least one detected human face without a need to search said video data for said video segments. - View Dependent Claims (38)
-
Specification