Multi-microphone speech recognition systems and related techniques
First Claim
1. A speech recognition system for resolving impaired utterances, comprising:
- a speech recognition engine configured to receive a plurality of representations of an utterance corresponding to output from one or more microphone transducers in a plurality of microphone transducers exposed to the utterance, and further configured to determine, concurrently, a plurality of highest-likelihood transcription candidates, each corresponding to a respective representation of the utterance;
a selector configured to determine a most-likely accurate transcription of the utterance from among the plurality of highest-likelihood transcription candidates; and
an output device configured to output recognized speech corresponding to the most-likely accurate transcription of the utterance.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates. As but one example, the plurality of representations of the utterance can be acquired by a microphone array, and beamforming techniques can generate independent streams of the utterance across various look directions using output from the microphone array.
59 Citations
20 Claims
-
1. A speech recognition system for resolving impaired utterances, comprising:
-
a speech recognition engine configured to receive a plurality of representations of an utterance corresponding to output from one or more microphone transducers in a plurality of microphone transducers exposed to the utterance, and further configured to determine, concurrently, a plurality of highest-likelihood transcription candidates, each corresponding to a respective representation of the utterance; a selector configured to determine a most-likely accurate transcription of the utterance from among the plurality of highest-likelihood transcription candidates; and an output device configured to output recognized speech corresponding to the most-likely accurate transcription of the utterance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A speech recognition method comprising:
-
receiving a plurality of representations of an utterance, wherein each representation corresponds to output from one or more microphone transducers among a plurality of microphone transducers exposed to the utterance; concurrently determining a plurality of highest-likelihood transcription candidates corresponding to each representation of the utterance; selecting a most-likely accurate transcription from among the transcription candidates corresponding to the plurality of representations of the utterance; and outputting recognized speech corresponding to the most-likely accurate transcription of the utterance. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A speech recognition method comprising:
-
concurrently determining a plurality of highest-likelihood transcription candidates corresponding to each of a plurality of representations of an utterance; and selecting a most-likely accurate transcription from among the transcription candidates corresponding to the plurality of representations of the utterance, wherein each of the highest-likelihood transcription candidates has an associated likelihood of being an accurate transcription of the utterance, and the act of selecting the most-likely accurate transcription comprises selecting from among the transcription candidates a transcription candidate having a largest likelihood among the plurality of transcription candidates, a transcription candidate having a largest net likelihood among the plurality of transcription candidates, a transcription candidate having a largest frequency of being a highest likelihood transcription candidate from each representation of the utterance, or a transcription candidate having a highest cumulative rank order among the representations of the utterance, wherein a rank order for each transcription candidate corresponding to a given representation corresponds to the relative likelihood of the respective transcription candidate compared to a likelihood of each of the other transcription candidates corresponding to the given utterance.
-
-
15. A non-transitory, computer-readable media containing instructions that, when executed by a processor, cause a computing environment to perform a speech recognition method comprising:
-
receiving a plurality of representations of an utterance, wherein each representation corresponds to output from one or more microphone transducers among a plurality of microphone transducers exposed to the utterance; concurrently determining a plurality of highest-likelihood transcription candidates corresponding to each representation of the utterance; selecting a most-likely accurate transcription from among the transcription candidates corresponding to the plurality of representations of the utterance; and outputting recognized speech corresponding to the most-likely accurate transcription of the utterance. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification