Fusion of audio and video based speaker identification for multimedia information access
First Claim
1. A method for identifying a speaker in an audio-video source, said audio-video source having audio information and video information, said method comprising the steps of:
- processing said audio information to identify a plurality of potential speakers, each of said identified speakers having an associated confidence score;
processing said video information to identify a plurality of potential individuals in an image, each of said identified individuals having an associated confidence score; and
identifying said speaker in said audio-video source based on said audio and video information, wherein said audio and video information is weighted based on slope information derived from said confidence scores.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus are disclosed for identifying a speaker in an audio-video source using both audio and video information. An audio-based speaker identification system identifies one or more potential speakers for a given segment using an enrolled speaker database. A video-based speaker identification system identifies one or more potential speakers for a given segment using a face detector/recognizer and an enrolled face database. An audio-video decision fusion process evaluates the individuals identified by the audio-based and video-based speaker identification systems and determines the speaker of an utterance in accordance with the present invention. A linear variation is imposed on the ranked-lists produced using the audio and video information. The decision fusion scheme of the present invention is based on a linear combination of the audio and the video ranked-lists. The line with the higher slope is assumed to convey more discriminative information. The normalized slopes of the two lines are used as the weight of the respective results when combining the scores from the audio-based and video-based speaker analysis. In this manner, the weights are derived from the data itself.
201 Citations
14 Claims
-
1. A method for identifying a speaker in an audio-video source, said audio-video source having audio information and video information, said method comprising the steps of:
-
processing said audio information to identify a plurality of potential speakers, each of said identified speakers having an associated confidence score;
processing said video information to identify a plurality of potential individuals in an image, each of said identified individuals having an associated confidence score; and
identifying said speaker in said audio-video source based on said audio and video information, wherein said audio and video information is weighted based on slope information derived from said confidence scores. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for identifying a speaker in an audio-video source, said audio-video source having audio information and video information, said method comprising the steps of:
-
processing said audio information to identify a ranked-list of potential speakers, each of said identified speakers having an associated confidence score;
processing said video information to identify a ranked-list of potential individuals in an image, each of said identified individuals having an associated confidence score; and
identifying said speaker in said audio-video source based on said audio and video information, wherein said audio and video information is weighted based on slope information derived from said confidence scores. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A system for identifying a speaker in an audio-video source, said audio-video source having audio information and video information, said system comprising:
-
a memory that stores computer-readable code; and
a processor operatively coupled to said memory, said processor configured to implement said computer-readable code, said computer-readable code configured to;
process said audio information to identify a plurality of potential speakers, each of said identified speakers having an associated confidence score;
process said video information to identify a plurality of potential individuals in an image, each of said identified individuals having an associated confidence score; and
identify said speaker in said audio-video source based on said audio and video information, wherein said audio and video information is weighted based on slope information derived from said confidence scores.
-
-
12. A system for identifying a speaker in an audio-video source, said audio-video source having audio information and video information, said system comprising:
-
a memory that stores computer-readable code; and
a processor operatively coupled to said memory, said processor configured to implement said computer-readable code, said computer-readable code configured to;
process said audio information to identify a ranked-list of potential speakers, each of said identified speakers having an associated confidence score;
process said video information to identify a ranked-list of potential individuals in an image, each of said identified individuals having an associated confidence score; and
identify said speaker in said audio-video source based on said audio and video information, wherein said audio and video information is weighted based on slope information derived from said confidence scores.
-
-
13. An article of manufacture for identifying a speaker in an audio-video source, said audio-video source having audio information and video information, said article of manufacture comprising:
-
a computer readable medium having computer readable code means embodied thereon, said computer readable program code means comprising;
a step to process said audio information to identify a plurality of potential speakers, each of said identified speakers having an associated confidence score;
a step to process said video information to identify a plurality of potential individuals in an image, each of said identified individuals having an associated confidence score; and
a step to identify said speaker in said audio-video source based on said audio and video information, wherein said audio and video information is weighted based on slope information derived from said confidence scores.
-
-
14. An article of manufacture for identifying a speaker in an audio-video source, said audio-video source having audio information and video information, said article of manufacture comprising:
-
a computer readable medium having computer readable code means embodied thereon, said computer readable program code means comprising;
a step to process said audio information to identify a ranked-list of potential speakers, each of said identified speakers having an associated confidence score;
a step to process said video information to identify a ranked-list of potential individuals in an image, each of said identified individuals having an associated confidence score; and
a step to identify said speaker in said audio-video source based on said audio and video information, wherein said audio and video information is weighted based on slope information derived from said confidence scores.
-
Specification