Video retrieval system for human face content
First Claim
1. A method for processing video data, comprising:
- detecting human faces in a plurality of video frames in said video data using a processor;
for at least one detected human face, identifying a face-specific set of video frames using said processor irrespective of whether said detected human face is present in said face-specific set of video frames in a substantially temporally continuous manner;
grouping all video frames in said face-specific set of video frames into a plurality of face tracks using said processor, wherein each face track contains corresponding one or more video frames having at least a substantial temporal continuity therebetween;
segmenting pixels associated with said at least one detected human face in each video frame in said face-specific set of video frames using said processor so as to extract color signature of said at least one detected human face in each said face-specific video frame;
using said processor, merging two or more of said plurality of face tracks that are disjoint in time based on a comparison of the color signatures of said at least one detected human face appearing in video frames constituting said two or more of said plurality of face tracks; and
enabling a user to view on an electronic display for said processor face-specific video segments of said at least one detected human face in said video data based on said merging of temporally disjoint face tracks.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for video retrieval and cueing that automatically detects human faces in the video and identifies face-specific video frames so as to allow retrieval and viewing of person-specific video segments. In one embodiment, the method locates human faces in the video, stores the time stamps associated with each face, displays a single image associated with each face, matches each face against a database, computes face locations with respect to a common 3D coordinate system, and provides a means of displaying: 1) information retrieved from the database associated with a selected person or people, 2) path of travel associated with a selected person or people 3) interaction graph of people in video, 4) video segments associated with each person and/or face. The method may also provide the ability to input and store text annotations associated with each person, face, and video segment, and the ability to enroll and remove people from database. The videos of non-human objects may be processed in a similar manner. Because of the rules governing abstracts, this abstract should not be used to construe the claims.
-
Citations
38 Claims
-
1. A method for processing video data, comprising:
-
detecting human faces in a plurality of video frames in said video data using a processor; for at least one detected human face, identifying a face-specific set of video frames using said processor irrespective of whether said detected human face is present in said face-specific set of video frames in a substantially temporally continuous manner; grouping all video frames in said face-specific set of video frames into a plurality of face tracks using said processor, wherein each face track contains corresponding one or more video frames having at least a substantial temporal continuity therebetween; segmenting pixels associated with said at least one detected human face in each video frame in said face-specific set of video frames using said processor so as to extract color signature of said at least one detected human face in each said face-specific video frame; using said processor, merging two or more of said plurality of face tracks that are disjoint in time based on a comparison of the color signatures of said at least one detected human face appearing in video frames constituting said two or more of said plurality of face tracks; and enabling a user to view on an electronic display for said processor face-specific video segments of said at least one detected human face in said video data based on said merging of temporally disjoint face tracks. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 26, 27, 28)
-
-
18. A method for processing video data, comprising:
-
detecting objects in a plurality of video frames in said video data using a processor; for at least one detected object, identifying an object-specific set of video frames using said processor irrespective of whether said detected object is present in said object-specific set of video frames in a substantially temporally continuous manner; grouping all video frames in said object-specific set of video frames into a plurality of object tracks using said processor, wherein each object track contains corresponding one or more video frames having at least a substantial temporal continuity therebetween; segmenting pixels associated with said at least one detected object in each video frame in said object-specific set of video frames using said processor so as to extract color signature of said at least one detected object in each said object-specific video frame; using said processor, merging two or more of said plurality of object tracks that are disjoint in time based on a comparison of the color signatures of said at least one detected object appearing in video frames constituting said two or more of said plurality of object tracks; and enabling a user to view on an electronic display for said processor object-specific video segments of said at least one detected object in said video data based on said merging of temporally disjoint object tracks. - View Dependent Claims (29)
-
-
19. A data storage medium containing a program code, which, when executed by a processor, causes said processor to perform the following:
-
receive video data; detect human faces in a plurality of video frames in said video data; for at least one detected human face, identify a face-specific set of video frames irrespective of whether said detected human face is present in said face-specific set of video frames in a substantially temporally continuous manner; group all video frames in said face-specific set of video frames into a plurality of face tracks, wherein each face track contains corresponding one or more video frames having at least a substantial temporal continuity therebetween; segment pixels associated with said at least one detected human face in each video frame in said face-specific set of video frames so as to extract color signature of said at least one detected human face in each said face-specific video frame; merge two or more of said plurality of face tracks that are disjoint in time based on a comparison of the color signatures of said at least one detected human face appearing in video frames constituting said two or more of said plurality of face tracks; and enable a user to view face-specific video segments of said at least one detected human face in said video data based on said merger of temporally disjoint face tracks. - View Dependent Claims (20, 21, 22, 30, 31, 32)
-
-
23. A system for processing video data, comprising:
-
means for detecting human faces in a plurality of video frames in said video data; for at least one detected human face, means for identifying a face-specific set of video frames irrespective of whether said detected human face is present in said face-specific set of video frames in a substantially temporally continuous manner; means for grouping all video frames in said face-specific set of video frames into a plurality of face tracks, wherein each face track contains corresponding one or more video frames having at least a substantial temporal continuity therebetween; means for segmenting pixels associated with said at least one detected human face in each video frame in said face-specific set of video frames so as to extract color signature of said at least one detected human face in each said face-specific video frame; means for merging two or more of said plurality of face tracks that are disjoint in time based on a comparison of the color signatures of said at least one detected human face appearing in video frames constituting said two or more of said plurality of face tracks; and means for displaying face-specific video segments of said at least one detected human face in said video data based on said merger of temporally disjoint face tracks. - View Dependent Claims (24, 33, 34, 35)
-
-
25. A computer system, which, upon being programmed, is configured to perform the following:
-
receive video data; detect human faces in a plurality of video frames in said video data; for at least one detected human face, identify a face-specific set of video frames irrespective of whether said detected human face is present in said face-specific set of video frames in a substantially temporally continuous manner; group all video frames in said face-specific set of video frames into a plurality of face tracks, wherein each face track contains corresponding one or more video frames having at least a substantial temporal continuity therebetween; segment pixels associated with said at least one detected human face in each video frame in said face-specific set of video frames so as to extract color signature of said at least one detected human face in each said face-specific video frame; merge two or more of said plurality of face tracks that are disjoint in time based on a comparison of the color signatures of said at least one detected human face appearing in video frames constituting said two or more of said plurality of face tracks; and enable a user to view face-specific video segments of said at least one detected human face in said video data based on said merger of temporally disjoint face tracks. - View Dependent Claims (36, 37, 38)
-
Specification