Method and apparatus for transcribing speech when a plurality of speakers are participating
First Claim
1. A method for transcribing speech of a plurality of speakers, comprising:
- providing said speech to a plurality of speech decoders, each of said decoders using a speaker model corresponding to a different one of said speakers and generating a confidence score for each decoded output;
selecting a decoded output based on said confidence score; and
presenting said decoded output as a string of words for the decoded output having the highest confidence score and as phones or syllables for all other decoded outputs.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus are disclosed for transcribing speech when a number of speakers are participating. A number of different speech recognition systems, each with a different speaker model, are executed in parallel. When the identity of all of the participating speakers is known and a speaker model is available for each participant, each speech recognition system employs a different speaker model suitable for a corresponding participant. Each speech recognition system decodes the speech and generates a corresponding confidence score. The decoded output having the highest confidence score is selected for presentation to a user. When all participating speakers are not known, or when there are too many participants to implement a unique speaker model for each participant, a speaker independent speech recognition system is employed together with a speaker specific speech recognition system. A controller selects between the decoded outputs of the speaker independent speech recognition system and the speaker specific speech recognition system based on information received from a speaker identification system and a speaker change detector.
-
Citations
9 Claims
-
1. A method for transcribing speech of a plurality of speakers, comprising:
-
providing said speech to a plurality of speech decoders, each of said decoders using a speaker model corresponding to a different one of said speakers and generating a confidence score for each decoded output; selecting a decoded output based on said confidence score; and presenting said decoded output as a string of words for the decoded output having the highest confidence score and as phones or syllables for all other decoded outputs. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
Specification