Speaker segmentation and recognition based on list of speakers
First Claim
Patent Images
1. A method, comprising:
- estimating, by a processor, an approximate list of potential speakers in a video file from one or more applications, wherein the video file includes an audio recording of a plurality of speakers;
segmenting, by the processor, the video file according to the approximate list of potential speakers such that each segment corresponds to at least one speaker;
recognizing, by the processor, particular speakers in the video file based on the approximate list of potential speakers;
generating, by the processor, social graphs of the potential speakers from speaker information comprising previous meetings attended by the potential speakers, and information about other speakers and attendees at the previous meetings; and
revising, by the processor, the approximate list of potential speakers by iteratively running a speaker segmentation and recognition (SSR) algorithm using information from the social graphs of the potential speakers.
1 Assignment
0 Petitions
Accused Products
Abstract
A method is provided and includes estimating an approximate list of potential speakers in a file from one or more applications. The file (e.g., an audio file, video file, or any suitable combination thereof) includes a recording of a plurality of speakers. The method also includes segmenting the file according to the approximate list of potential speakers such that each segment corresponds to at least one speaker; and recognizing particular speakers in the file based on the approximate list of potential speakers.
61 Citations
20 Claims
-
1. A method, comprising:
-
estimating, by a processor, an approximate list of potential speakers in a video file from one or more applications, wherein the video file includes an audio recording of a plurality of speakers; segmenting, by the processor, the video file according to the approximate list of potential speakers such that each segment corresponds to at least one speaker; recognizing, by the processor, particular speakers in the video file based on the approximate list of potential speakers; generating, by the processor, social graphs of the potential speakers from speaker information comprising previous meetings attended by the potential speakers, and information about other speakers and attendees at the previous meetings; and revising, by the processor, the approximate list of potential speakers by iteratively running a speaker segmentation and recognition (SSR) algorithm using information from the social graphs of the potential speakers. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A non-transitory computer readable medium storing program instructions for execution by a processor to perform operations comprising:
-
estimating an approximate list of potential speakers in a video file from one or more applications, wherein the video file includes an audio recording of a plurality of speakers; segmenting the video file according to the approximate list of potential speakers, such that each segment corresponds to at least one speaker; recognizing particular speakers in the video file based on the approximate list of potential speakers; generating social graphs of the potential speakers from speaker information comprising previous meetings attended by the potential speakers, and information about other speakers and attendees at the previous meetings; and revising the approximate list of potential speakers by iteratively running a speaker segmentation and recognition (SSR) algorithm using information from the social graphs of the potential speakers. - View Dependent Claims (12, 13, 14, 15)
-
-
16. An apparatus, comprising:
-
a memory element for storing data; and a processor that executes instructions associated with the data, wherein the processor and the memory element cooperate such that the apparatus is configured to; estimate an approximate list of potential speakers in a video file from one or more applications, wherein the video file includes a recording of a plurality of speakers; segment the video file according to the approximate list of potential speakers, such that each segment corresponds to at least one speaker; recognize particular speakers in the video file based on the approximate list of potential speakers; generating social graphs of the potential speakers from speaker information comprising previous meetings attended by the potential speakers, and information about other speakers and attendees at the previous meetings; and revising the approximate list of potential speakers by iteratively running a speaker segmentation and recognition (SSR) algorithm using information from the social graphs of the potential speakers. - View Dependent Claims (17, 18, 19, 20)
-
Specification