Multi-microphone speech recognition systems and related techniques
First Claim
1. A speech recognition system for resolving far-field utterances, comprising:
- an acoustic appliance comprising a processor, a memory and a communication connection to communicate with one or more spatially distributed acoustic appliances, wherein the memory stores instructions which, when executed by the processor, cause the system toconcurrently receive over the communication connection a plurality of representations of an utterance observed by the one or more spatially distributed acoustic appliances;
determine a highest-probability representation of the utterance based on the plurality of utterance representations; and
determine a most-likely transcription corresponding to the highest-probability representation of the utterance.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates. As but one example, the plurality of representations of the utterance can be acquired by a microphone array, and beamforming techniques can generate independent streams of the utterance across various look directions using output from the microphone array.
-
Citations
19 Claims
-
1. A speech recognition system for resolving far-field utterances, comprising:
an acoustic appliance comprising a processor, a memory and a communication connection to communicate with one or more spatially distributed acoustic appliances, wherein the memory stores instructions which, when executed by the processor, cause the system to concurrently receive over the communication connection a plurality of representations of an utterance observed by the one or more spatially distributed acoustic appliances; determine a highest-probability representation of the utterance based on the plurality of utterance representations; and determine a most-likely transcription corresponding to the highest-probability representation of the utterance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
13. A speech-recognition method, comprising:
-
over a communication connection, receiving from a plurality of spatially distributed audio appliances a corresponding plurality of representations of an utterance observed by the acoustic appliances, wherein each audio appliance has one or more microphone transducers and associated circuitry to convert observed audio to an acoustic signal representative of the audio; selecting a highest-probability representation of the utterance based on the plurality of representations of the utterance; determining a most-likely transcription of the utterance in correspondence to the highest-probability representation of the utterance; and responsive to the most-likely transcription of the utterance, invoking one or more instructions. - View Dependent Claims (14, 15, 16)
-
-
17. A non-transitory, computer-readable media containing instructions that, when executed by a processor, cause a computing environment to perform a speech recognition method comprising:
-
over a communication connection, receiving from a plurality of spatially distributed audio appliances a corresponding plurality of representations of an utterance observed by the acoustic appliances, wherein each audio appliance has one or more microphone transducers and associated circuitry to convert observed audio to an acoustic signal representative of the audio; selecting a highest-probability representation of the utterance based on the plurality of representations of the utterance; determining a most-likely transcription of the utterance in correspondence to the highest-probability representation of the utterance; and responsive to the most-likely transcription of the utterance, invoking one or more instructions. - View Dependent Claims (18, 19)
-
Specification