Multi-Microphone Speech Recognition Systems and Related Techniques

US 20160358619A1
Filed: 06/06/2015
Published: 12/08/2016
Est. Priority Date: 06/06/2015
Status: Active Grant

First Claim

Patent Images

1. A speech recognition system for resolving far-field utterances, comprising:

a recognition engine configured to concurrently receive a plurality of representations of an utterance and to determine a highest-probability representation of the utterance; and

an utterance decoder configured to determine a most-likely transcription corresponding to the highest-probability representation of the utterance.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates. As but one example, the plurality of representations of the utterance can be acquired by a microphone array, and beamforming techniques can generate independent streams of the utterance across various look directions using output from the microphone array.

191 Citations

31 Claims

1. A speech recognition system for resolving far-field utterances, comprising:
- a recognition engine configured to concurrently receive a plurality of representations of an utterance and to determine a highest-probability representation of the utterance; and
  
  an utterance decoder configured to determine a most-likely transcription corresponding to the highest-probability representation of the utterance.
- View Dependent Claims (2, 3, 4, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The speech recognition system according to claim 1, wherein the plurality of representations of the utterance comprises a concatenation of utterance representations and corresponding posterior probabilities.
  - 3. The speech recognition system according to claim 1, wherein each of the utterance representations has an associated posterior probability, and wherein the speech recognition engine is configured to determine the highest-probability representation of the utterance based in part on a combination of the plurality of posterior probabilities corresponding to the utterance representations.
  - 4. The speech recognition system according to claim 1, wherein the speech recognition engine more heavily weights utterance representations having relatively higher signal-to-noise ratio as compared to utterance representations having relatively lower signal-to-noise ratio.
  - 9. The speech recognition system according to claim 1, further comprising a recognition parameter store, wherein the utterance decoder is configured to combine one or more recognition parameters from the recognition parameter store with the highest-probability representations of the utterance.
  - 10. The speech recognition system according to claim 9, wherein the plurality of representations of the utterance comprises a plurality of acoustic features, each having a corresponding posterior probability, and wherein the speech recognition engine is configured to identify the acoustic features having a highest-probability of correctly representing the utterance, and wherein the utterance decoder is further configured to combine the one or more recognition parameters from the recognition parameter store with the highest probability acoustic features.
  - 11. The speech recognition system of claim 10, wherein the recognition parameter store comprises an acoustic feature dictionary, a language model, or both.
  - 12. The speech recognition system of claim 11, wherein the plurality of utterance representations comprises a plurality of phonemes and the acoustic feature dictionary comprises a phonetic dictionary.
  - 13. The speech recognition system according to claim 1, wherein each representation of the utterance comprises one or more respective acoustic features and corresponding posterior probabilities, the system further comprising:
    - a hub configured to receive a stream of the acoustic features and corresponding posterior probabilities representative of the utterance from each of a plurality of acoustic feature extractors;
      
      an aggregator configured to aggregate the plurality of streams and corresponding posterior probabilities; and
      
      a communication connection configured to transmit the aggregated plurality of streams and corresponding posterior probabilities to the speech recognition engine configured to select from the concatenated plurality of streams those acoustic features most likely to accurately reflect the utterance.
  - 14. The distributed speech recognition system according to claim 13, further comprising an appliance configured to extract acoustic features from an acoustic signal received by the appliance and to stream the extracted acoustic features to the hub.
  - 15. The distributed speech recognition system according to claim 14, wherein the appliance comprises a first appliance, the speech recognition system further comprising a second appliance configured to extract acoustic features an acoustic signal received by the second appliance and to stream the extracted acoustic features to the hub.
  - 16. The distributed speech recognition system according to claim 15, wherein the first appliance comprises a near-field acoustic-feature extractor, a far-field acoustic-feature extractor, or both.
  - 17. The distributed speech recognition system according to claim 15, wherein the hub is configured to synchronize the plurality of received streams of acoustic features and associated posterior probabilities.

5. (canceled)

6. (canceled)

7. (canceled)

8. (canceled)

18. A speech-recognition method, comprising:
- selecting a highest-probability representation of an utterance from a plurality of concurrently generated representations of the utterance; and
  
  determining a most-likely transcription of the utterance in correspondence to the highest-probability representation of the utterance.
- View Dependent Claims (19, 20)
- - 19. The speech recognition method according to claim 18, wherein each of the utterance representations has an associated posterior probability, and wherein the act of selecting the highest-probability representation comprises combining the plurality of posterior probabilities corresponding to the utterance representations.
  - 20. The speech recognition method according to claim 19, wherein the act of selecting the highest-probability representation further comprises more heavily weighting utterance representations having relatively higher signal-to-noise ratio as compared to utterance representations having relatively lower signal-to-noise ratio.

21. (canceled)

22. (canceled)

23. (canceled)
- View Dependent Claims (24)
- - 24. The speech recognition method according to claim 23, further comprising combining one or more recognition parameters from a recognition parameter store with the highest-probability utterance representations.

25. A non-transitory, computer-readable media containing instructions that, when executed by a processor, cause a computing environment to perform a speech recognition method comprising:
- selecting a highest-probability representation of an utterance from a plurality of concurrently generated representations of the utterance; and
  
  determining a most-likely transcription of the utterance in correspondence to the highest-probability representation of the utterance.
- View Dependent Claims (26, 27)
- - 26. The non-transitory, computer-readable media according to claim 25, wherein each of the utterance representations has an associated posterior probability, and wherein the act of selecting the highest-probability representation comprises combining the plurality of posterior probabilities corresponding to the utterance representations.
  - 27. The non-transitory, computer-readable media according to claim 26, wherein the act of selecting the highest-probability representation further comprises more heavily weighting utterance representations having relatively higher signal-to-noise ratio as compared to utterance representations having relatively lower signal-to-noise ratio.

28. (canceled)

29. (canceled)

30. (canceled)

31. (canceled)

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
Ramprashad, Sean A., Thornburg, Harvey D., Krishnaswamy, Arvindh, Lindahl, Aram M.

Granted Patent

US 9,865,265 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G10L 15/16   using artificial neural net...

G10L 15/20   Speech recognition techniqu...

G10L 15/34   Adaptation of a single reco...

G10L 2015/022   Demisyllables, biphones or ...

G10L 2021/02166   Microphone arrays; Beamforming

Multi-Microphone Speech Recognition Systems and Related Techniques

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

191 Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

Multi-Microphone Speech Recognition Systems and Related Techniques

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

191 Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links