Multi-microphone speech recognition systems and related techniques

US 9,865,265 B2
Filed: 06/06/2015
Issued: 01/09/2018
Est. Priority Date: 06/06/2015
Status: Active Grant

First Claim

Patent Images

1. A speech recognition system for resolving far-field utterances, comprising:

an acoustic appliance comprising a processor, a memory and a communication connection to communicate with one or more spatially distributed acoustic appliances, wherein the memory stores instructions which, when executed by the processor, cause the system toconcurrently receive over the communication connection a plurality of representations of an utterance observed by the one or more spatially distributed acoustic appliances;

determine a highest-probability representation of the utterance based on the plurality of utterance representations; and

determine a most-likely transcription corresponding to the highest-probability representation of the utterance.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates. As but one example, the plurality of representations of the utterance can be acquired by a microphone array, and beamforming techniques can generate independent streams of the utterance across various look directions using output from the microphone array.

Citations

19 Claims

1. A speech recognition system for resolving far-field utterances, comprising:
- an acoustic appliance comprising a processor, a memory and a communication connection to communicate with one or more spatially distributed acoustic appliances, wherein the memory stores instructions which, when executed by the processor, cause the system toconcurrently receive over the communication connection a plurality of representations of an utterance observed by the one or more spatially distributed acoustic appliances;
  
  determine a highest-probability representation of the utterance based on the plurality of utterance representations; and
  
  determine a most-likely transcription corresponding to the highest-probability representation of the utterance.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The speech recognition system according to claim 1, wherein the plurality of representations of the utterance comprises a concatenation of utterance representations and corresponding posterior probabilities.
  - 3. The speech recognition system according to claim 1, wherein each of the utterance representations has an associated posterior probability, and wherein the highest-probability representation of the utterance is further based in part on a combination of the plurality of posterior probabilities corresponding to the utterance representations.
  - 4. The speech recognition system according to claim 1, wherein utterance representations having relatively higher signal-to-noise ratio as compared to utterance representations having relatively lower signal-to-noise ratio are more heavily weighted to determine the highest-probability representation of the utterance.
  - 5. The speech recognition system according to claim 1, wherein the memory further contains a recognition parameter store, wherein the instructions, when executed by the processor, further cause the system to combine one or more recognition parameters from the recognition parameter store with the highest-probability representations of the utterance.
  - 6. The speech recognition system according to claim 5, wherein the plurality of representations of the utterance comprises a plurality of acoustic features, each having a corresponding posterior probability, and wherein the instructions, when executed by the processor, further cause the system to identify the acoustic features having a highest-probability of correctly representing the utterance, and to combine the one or more recognition parameters from the recognition parameter store with the highest probability acoustic features.
  - 7. The speech recognition system of claim 6, wherein the recognition parameter store comprises an acoustic feature dictionary, a language model, or both.
  - 8. The speech recognition system of claim 7, wherein the plurality of utterance representations comprises a plurality of phonemes and the acoustic feature dictionary comprises a phonetic dictionary.
  - 9. The speech recognition system according to claim 1, wherein each representation of the utterance comprises one or more respective acoustic features and corresponding posterior probabilities, and wherein the instructions, when executed by the processor, further cause the systemto aggregate the plurality of streams and corresponding posterior probabilities;
    - andto select from the aggregated plurality of streams those acoustic features most likely to accurately reflect the utterance.
  - 10. The speech recognition system according to claim 9, wherein the acoustic appliance comprises a first acoustic appliance, the system further comprising a second acoustic appliance to extract acoustic features from an acoustic signal received by the-second appliance and to stream the extracted acoustic features over the communication connection to the first acoustic appliance.
  - 11. The speech recognition system according to claim 10, wherein the first appliance and/or the second appliance comprises a near-field acoustic-feature extractor, a far-field acoustic-feature extractor, or both.
  - 12. The speech recognition system according to claim 10, wherein the first appliance is configured to synchronize the plurality of received streams of acoustic features and associated posterior probabilities.

13. A speech-recognition method, comprising:
- over a communication connection, receiving from a plurality of spatially distributed audio appliances a corresponding plurality of representations of an utterance observed by the acoustic appliances, wherein each audio appliance has one or more microphone transducers and associated circuitry to convert observed audio to an acoustic signal representative of the audio;
  
  selecting a highest-probability representation of the utterance based on the plurality of representations of the utterance;
  
  determining a most-likely transcription of the utterance in correspondence to the highest-probability representation of the utterance; and
  
  responsive to the most-likely transcription of the utterance, invoking one or more instructions.
- View Dependent Claims (14, 15, 16)
- - 14. The speech recognition method according to claim 13, wherein each of the utterance representations has an associated posterior probability, and wherein the act of selecting the highest-probability representation comprises combining the plurality of posterior probabilities corresponding to the utterance representations.
  - 15. The speech recognition method according to claim 14, wherein the act of selecting the highest-probability representation further comprises more heavily weighting utterance representations having relatively higher signal-to-noise ratio as compared to utterance representations having relatively lower signal-to-noise ratio.
  - 16. The speech recognition method according to claim 13, further comprising combining one or more recognition parameters from a recognition parameter store with the highest-probability utterance representation.

17. A non-transitory, computer-readable media containing instructions that, when executed by a processor, cause a computing environment to perform a speech recognition method comprising:
- over a communication connection, receiving from a plurality of spatially distributed audio appliances a corresponding plurality of representations of an utterance observed by the acoustic appliances, wherein each audio appliance has one or more microphone transducers and associated circuitry to convert observed audio to an acoustic signal representative of the audio;
  
  selecting a highest-probability representation of the utterance based on the plurality of representations of the utterance;
  
  determining a most-likely transcription of the utterance in correspondence to the highest-probability representation of the utterance; and
  
  responsive to the most-likely transcription of the utterance, invoking one or more instructions.
- View Dependent Claims (18, 19)
- - 18. The non-transitory, computer-readable media according to claim 17, wherein each of the utterance representations has an associated posterior probability, and wherein the act of selecting the highest-probability representation comprises combining the plurality of posterior probabilities corresponding to the utterance representations.
  - 19. The non-transitory, computer-readable media according to claim 18, wherein the act of selecting the highest-probability representation further comprises more heavily weighting utterance representations having relatively higher signal-to-noise ratio as compared to utterance representations having relatively lower signal-to-noise ratio.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
Ramprashad, Sean A., Thornburg, Harvey D., Krishnaswamy, Arvindh, Lindahl, Aram M.
Primary Examiner(s)
SINGH, SATWANT K

Application Number

US14/732,715
Publication Number

US 20160358619A1
Time in Patent Office

948 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 15/16   using artificial neural net...

G10L 15/20   Speech recognition techniqu...

G10L 15/34   Adaptation of a single reco...

G10L 2015/022   Demisyllables, biphones or ...

G10L 2021/02166   Microphone arrays; Beamforming

Multi-microphone speech recognition systems and related techniques

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Multi-microphone speech recognition systems and related techniques

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links