Automatic face extraction for use in recorded meetings timelines
First Claim
Patent Images
1. A method, comprising:
- detecting two or more facial images in a video sample;
detecting two or more speakers in an audio sample that corresponds to the video sample;
detecting a primary speaker of the two or more speakers;
clustering the two or more speakers temporally and spatially;
storing a speaker timeline for each detected speaker that identifies the speaker by a speaker identifier and a speaker location at various times along a the speaker timeline;
storing at least one facial image for each detected speaker in a faces database; and
associating a speaker timeline and a facial image with each detected speaker.
2 Assignments
0 Petitions
Accused Products
Abstract
Faces of speakers in a meeting or conference are automatically detected and facial images corresponding to each speaker are stored in a faces database. A timeline is created to graphically identify when each speaker is speaking during playback of a recording of the meeting. Instead of generically identifying each speaker in the timeline, a facial image is shown to identify each speaker associated with the timeline.
97 Citations
32 Claims
-
1. A method, comprising:
-
detecting two or more facial images in a video sample; detecting two or more speakers in an audio sample that corresponds to the video sample; detecting a primary speaker of the two or more speakers; clustering the two or more speakers temporally and spatially; storing a speaker timeline for each detected speaker that identifies the speaker by a speaker identifier and a speaker location at various times along a the speaker timeline; storing at least one facial image for each detected speaker in a faces database; and associating a speaker timeline and a facial image with each detected speaker. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method, comprising:
-
displaying an audio/visual (A/V) sample having two or more speakers included therein; detecting a primary speaker of the two or more speakers; clustering the two or more speakers temporally and spatially; displaying a speaker timeline corresponding to each speaker of the two or more speakers, the speaker timeline indicating at what points along a temporal continuum the speaker corresponding to the speaker timeline is speaking; associating a speaker facial image with each speaker timeline, the speaker facial image corresponding to the speaker associated with the speaker timeline; and displaying the facial image with the corresponding speaker timeline. - View Dependent Claims (11, 12)
-
-
13. Computer storage media containing executable instructions that, when executed, implement the following method:
-
identifying each speaker in an Audio/Video (“
A/V”
) sample by a speaker identifier;identifying location for each speaker in the A/V sample; detecting a primary speaker; clustering each identified speaker temporally and spatially; extracting at least one facial image for each speaker identified in the A/V sample; creating a speaker timeline for each speaker identified in the A/V sample, each speaker timeline indicating a time, a speaker identifier and a speaker location; and associating the facial image for a speaker with a speaker timeline that corresponds to the same speaker. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. Computer storage media, comprising:
-
a speaker timeline database that includes a speaker timeline for each speaker in an A/V sample, each speaker timeline identifying a speaker and a speaker location for multiple times along a time continuum wherein a primary speaker has been determined and wherein each identified speaker has been clustered temporally and spatially; and a faces database that includes at least one facial image for each speaker identified in a speaker timeline and a speaker identifier that links each facial image with the appropriate speaker timeline in the speaker timeline database. - View Dependent Claims (24)
-
-
25. A system, comprising:
-
an Audio/Video (“
A/V”
) sample;means for identifying each speaker appearing in the A/V sample; means for identifying a facial image for each speaker identified in the A/V sample; means for detecting a primary speaker; means for clustering each speaker temporally and spatially; means for creating a speaker timeline for each speaker identified in the A/V sample; and means for associating a facial image with an appropriate speaker timeline. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32)
-
Specification