Speech recognition using associative mapping

US 10,204,619 B2
Filed: 02/22/2016
Issued: 02/12/2019
Est. Priority Date: 10/22/2014
Status: Active Grant

First Claim

Patent Images

1. A method performed by one or more computers that provide an automated speech recognition service, the method comprising:

receiving, by the one or more computers, audio data for an utterance detected by a device;

accessing, by the one or more computers, association data that specifies, for each key in a set of multiple predetermined keys, an association between (i) a set of one or more precomputed speech recognition probability scores that are each determined based on first audio data indicating characteristics of an audio segment, and (ii) a corresponding key that is determined based on second audio data indicating characteristics of a corrupted version of the audio segment;

determining, by the one or more computers, a retrieval key based on the audio data for the utterance;

selecting, by the one or more computers and from among the sets of precomputed speech recognition probability scores, a particular set of precomputed speech recognition probability scores, based at least on comparing the determined retrieval key and the multiple predetermined keys; and

determining, by the one or more computers, a transcription for the utterance using the selected particular set of precomputed speech recognition probability scores.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus are described that receive audio data for an utterance. Association data is accessed that indicates associations between data corresponding to uncorrupted audio segments, and data corresponding to corrupted versions of the uncorrupted audio segments, where the associations are determined before receiving the audio data for the utterance. Using the association data and the received audio data for the utterance, data corresponding to at least one uncorrupted audio segment is selected. A transcription of the utterance is determined based on the selected data corresponding to the at least one uncorrupted audio segment.

117 Citations

20 Claims

1. A method performed by one or more computers that provide an automated speech recognition service, the method comprising:
- receiving, by the one or more computers, audio data for an utterance detected by a device;
  
  accessing, by the one or more computers, association data that specifies, for each key in a set of multiple predetermined keys, an association between (i) a set of one or more precomputed speech recognition probability scores that are each determined based on first audio data indicating characteristics of an audio segment, and (ii) a corresponding key that is determined based on second audio data indicating characteristics of a corrupted version of the audio segment;
  
  determining, by the one or more computers, a retrieval key based on the audio data for the utterance;
  
  selecting, by the one or more computers and from among the sets of precomputed speech recognition probability scores, a particular set of precomputed speech recognition probability scores, based at least on comparing the determined retrieval key and the multiple predetermined keys; and
  
  determining, by the one or more computers, a transcription for the utterance using the selected particular set of precomputed speech recognition probability scores.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 15, 18, 20)
- - 2. The method of claim 1, wherein each precomputed speech recognition probability score of a set of one or more precomputed speech recognition probability scores indicates a likelihood that a particular audio segment corresponds to a different phonetic unit.
  - 3. The method of claim 1, wherein at least some of the sets of one or more precomputed speech recognition probability scores are generated based on different audio segments from a set of multiple audio segments, and the association data specifies, for each key in the set of multiple predetermined keys, an association between (i) a set of one or more precomputed speech recognition probability scores that are each generated based on a particular audio segment of the set of multiple audio segments, and (ii) a key that is generated based on a corrupted version of the particular audio segment of the set of multiple audio segments, the keys and sets of scores being generated such that each key and its corresponding set of scores is derived using a version of a same audio segment from the set of multiple audio segments.
  - 4. The method of claim 1, wherein each of the precomputed speech recognition probability scores is an acoustic model score indicating a likelihood that a particular audio segment represents a particular phonetic unit.
  - 5. The method of claim 1, wherein the corrupted version of the audio segment is a version of the audio segment that has been modified to add noise, reverberation, echo, or distortion after the audio segment has been recorded.
  - 6. The method of claim 1, wherein the association data comprises a hash table that maps keys to values, wherein each key is a hash of audio data for an audio segment, and each value comprises one or more precomputed speech recognition probability scores.
  - 7. The method of claim 1, comprising:
    - determining, by the one or more computers, one or more other retrieval keys based on the audio data for the utterance;
      
      selecting, by the one or more computers and from among the sets of precomputed speech recognition probability scores, one or more other sets of precomputed speech recognition probability scores, based at least on comparing the determined one or more other retrieval keys and the multiple predetermined keys; and
      
      determining, by the one or more computers, a transcription for the utterance using the selected particular set of precomputed recognition probability scores and the selected one or more other sets of precomputed speech recognition probability scores.
  - 15. The method of claim 1, wherein each set of one or more precomputed speech recognition probability scores comprises an output of an acoustic model that was determined before the utterance was spoken.
  - 18. The method of claim 1, wherein, for at least one of the multiple predetermined keys, the association data indicates an association between the key and a set of multiple precomputed speech recognition probability scores that each correspond to a different phonetic unit.
  - 20. The method of claim 1, wherein the association data comprises a hash table that maps keys to values, wherein each key is a hash of a feature vector indicating characteristics of an audio segment, and each value comprises one or more precomputed speech recognition probability scores.

8. A system comprising:
- one or more computers that provide an automated speech recognition service and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving, by the one or more computers, audio data for an utterance detected by a device;
  
  accessing, by the one or more computers, association data that specifies, for each key in a set of multiple predetermined keys, an association between (i) a set of one or more precomputed speech recognition probability scores that are each determined based on first audio data indicating characteristics of an audio segment, and (ii) a corresponding key that is determined based on second audio data indicating characteristics of a corrupted version of the audio segment;
  
  determining, by the one or more computers, a retrieval key based on the audio data for the utterance;
  
  selecting, by the one or more computers and from among the sets of precomputed speech recognition probability scores, a particular set of precomputed speech recognition probability scores, based at least on comparing the determined retrieval key and the multiple predetermined keys; and
  
  determining, by the one or more computers, a transcription for the utterance using the selected particular set of precomputed speech recognition probability scores.
- View Dependent Claims (9, 10, 11, 12, 13, 16, 19)
- - 9. The system of claim 8, wherein each precomputed speech recognition probability score of a set of one or more precomputed speech recognition probability scores indicates a likelihood that a particular audio segment corresponds to a different phonetic unit.
  - 10. The system of claim 8, wherein each of the precomputed speech recognition probability scores is an acoustic model score indicating a likelihood that a particular audio segment represents a particular phonetic unit.
  - 11. The system of claim 8, wherein the corrupted version of the audio segment is a version of the audio segment that has been modified to add noise, reverberation, echo, or distortion after the audio segment has been recorded.
  - 12. The system of claim 8, wherein the association data comprises a hash table that maps keys to values, wherein each key is a hash of audio data for an audio segment, and each value comprises one or more precomputed speech recognition probability scores.
  - 13. The system of claim 8, wherein the operations comprise:
    - determining, by the one or more computers, one or more other retrieval keys based on the audio data for the utterance;
      
      selecting, by the one or more computers and from among the sets of precomputed speech recognition probability scores, one or more other sets of precomputed speech recognition probability scores, based at least on comparing the determined one or more other retrieval keys and the multiple predetermined keys; and
      
      determining, by the one or more computers, a transcription for the utterance using the selected particular set of precomputed recognition probability scores and the selected one or more other sets of precomputed speech recognition probability scores.
  - 16. The system of claim 8, wherein each set of one or more precomputed speech recognition probability scores comprises an output of an acoustic model that was determined before the utterance was spoken.
  - 19. The system of claim 8, wherein, for at least one of the multiple predetermined keys, the association data indicates an association between the key and a set of multiple precomputed speech recognition probability scores that each correspond to a different phonetic unit.

14. A non-transitory computer-readable storage device storing software comprising instructions executable by one or more computers that provide an automated speech recognition service, wherein the instructions, when executed by the one or more computers, cause the one or more computers to perform operations comprising:
- receiving, by the one or more computers, audio data for an utterance detected by a device;
  
  accessing, by the one or more computers, association data that specifies, for each key in a set of multiple predetermined keys, an association between (i) a set of one or more precomputed speech recognition probability scores that are each determined based on first audio data indicating characteristics of an audio segment, and (ii) a corresponding key that is determined based on second audio data indicating characteristics of a corrupted version of the audio segment;
  
  determining, by the one or more computers, a retrieval key based on the audio data for the utterance;
  
  selecting, by the one or more computers and from among the sets of precomputed speech recognition probability scores, a particular set of precomputed speech recognition probability scores, based at least on comparing the determined retrieval key and the multiple predetermined keys; and
  
  determining, by the one or more computers, a transcription for the utterance using the selected particular set of precomputed speech recognition probability scores.
- View Dependent Claims (17)
- - 17. The computer-readable storage device of claim 14, wherein each set of one or more precomputed speech recognition probability scores comprises an output of an acoustic model that was determined before the utterance was spoken.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Siohan, Olivier, Moreno Mengibar, Pedro J.
Primary Examiner(s)
Baker, Matthew H

Application Number

US15/049,892
Publication Number

US 20160171977A1
Time in Patent Office

1,086 Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/10   using distance or distortio...

G10L 15/20   Speech recognition techniqu...

G10L 15/26   Speech to text systems G10L...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 21/0308   characterised by the type o...

Speech recognition using associative mapping

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

117 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition using associative mapping

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

117 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links