Speech recognition using associative mapping
First Claim
1. A method performed by one or more computers that provide an automated speech recognition service, the method comprising:
- receiving, by the one or more computers, audio data for an utterance detected by a device;
accessing, by the one or more computers, association data that specifies, for each key in a set of multiple predetermined keys, an association between (i) a set of one or more precomputed speech recognition probability scores that are each determined based on first audio data indicating characteristics of an audio segment, and (ii) a corresponding key that is determined based on second audio data indicating characteristics of a corrupted version of the audio segment;
determining, by the one or more computers, a retrieval key based on the audio data for the utterance;
selecting, by the one or more computers and from among the sets of precomputed speech recognition probability scores, a particular set of precomputed speech recognition probability scores, based at least on comparing the determined retrieval key and the multiple predetermined keys; and
determining, by the one or more computers, a transcription for the utterance using the selected particular set of precomputed speech recognition probability scores.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus are described that receive audio data for an utterance. Association data is accessed that indicates associations between data corresponding to uncorrupted audio segments, and data corresponding to corrupted versions of the uncorrupted audio segments, where the associations are determined before receiving the audio data for the utterance. Using the association data and the received audio data for the utterance, data corresponding to at least one uncorrupted audio segment is selected. A transcription of the utterance is determined based on the selected data corresponding to the at least one uncorrupted audio segment.
117 Citations
20 Claims
-
1. A method performed by one or more computers that provide an automated speech recognition service, the method comprising:
receiving, by the one or more computers, audio data for an utterance detected by a device; accessing, by the one or more computers, association data that specifies, for each key in a set of multiple predetermined keys, an association between (i) a set of one or more precomputed speech recognition probability scores that are each determined based on first audio data indicating characteristics of an audio segment, and (ii) a corresponding key that is determined based on second audio data indicating characteristics of a corrupted version of the audio segment; determining, by the one or more computers, a retrieval key based on the audio data for the utterance; selecting, by the one or more computers and from among the sets of precomputed speech recognition probability scores, a particular set of precomputed speech recognition probability scores, based at least on comparing the determined retrieval key and the multiple predetermined keys; and determining, by the one or more computers, a transcription for the utterance using the selected particular set of precomputed speech recognition probability scores. - View Dependent Claims (2, 3, 4, 5, 6, 7, 15, 18, 20)
-
8. A system comprising:
one or more computers that provide an automated speech recognition service and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; receiving, by the one or more computers, audio data for an utterance detected by a device; accessing, by the one or more computers, association data that specifies, for each key in a set of multiple predetermined keys, an association between (i) a set of one or more precomputed speech recognition probability scores that are each determined based on first audio data indicating characteristics of an audio segment, and (ii) a corresponding key that is determined based on second audio data indicating characteristics of a corrupted version of the audio segment; determining, by the one or more computers, a retrieval key based on the audio data for the utterance; selecting, by the one or more computers and from among the sets of precomputed speech recognition probability scores, a particular set of precomputed speech recognition probability scores, based at least on comparing the determined retrieval key and the multiple predetermined keys; and determining, by the one or more computers, a transcription for the utterance using the selected particular set of precomputed speech recognition probability scores. - View Dependent Claims (9, 10, 11, 12, 13, 16, 19)
-
14. A non-transitory computer-readable storage device storing software comprising instructions executable by one or more computers that provide an automated speech recognition service, wherein the instructions, when executed by the one or more computers, cause the one or more computers to perform operations comprising:
-
receiving, by the one or more computers, audio data for an utterance detected by a device; accessing, by the one or more computers, association data that specifies, for each key in a set of multiple predetermined keys, an association between (i) a set of one or more precomputed speech recognition probability scores that are each determined based on first audio data indicating characteristics of an audio segment, and (ii) a corresponding key that is determined based on second audio data indicating characteristics of a corrupted version of the audio segment; determining, by the one or more computers, a retrieval key based on the audio data for the utterance; selecting, by the one or more computers and from among the sets of precomputed speech recognition probability scores, a particular set of precomputed speech recognition probability scores, based at least on comparing the determined retrieval key and the multiple predetermined keys; and determining, by the one or more computers, a transcription for the utterance using the selected particular set of precomputed speech recognition probability scores. - View Dependent Claims (17)
-
Specification