Acoustic model training using corrected terms
First Claim
1. A method comprising:
- receiving, from a client device and by a voice search system that includes (i) an automated speech recognizer that uses an acoustic model to transcribe utterances, (ii) a search engine, (iii) an acoustic model trainer that periodically retrains the acoustic model using portions of audio data that correspond to manually specified terms of first transcriptions, (iv) a user interface component, and (v) a correction classifier, first audio data corresponding to an utterance of a user;
obtaining, by the automated speech recognizer of the voice search system, a first transcription of the first audio data;
receiving, by the user interface component of the voice search system, data indicating (i) a selection of one or more terms of the first transcription and (ii) one or more of replacement terms that the user has manually specified as a replacement for the one or more terms;
determining, by the correction classifier of the voice search system, a minimum edit distance between the one or more terms of the first transcription and the one or more replacement terms;
determining, by the correction classifier of the voice search system and based at least on the minimum edit distance between the one or more terms of the first transcription and the one or more replacement terms that the user has manually specified as a replacement for the one or more terms, whether one or more of the replacement terms that the user has manually specified as a replacement for the one or more terms likely represent a correction of one or more of the one or more terms of the first transcription;
in response to determining, based at least on the minimum edit distance between the one or more terms of the first transcription and the one or more replacement terms that the user has manually specified as a replacement for the one or more terms, whether the one or more of the replacement terms that the user has manually specified as a replacement for the one or more terms likely represent a correction of the one or more terms of the first transcription, selectively retraining, by the acoustic model trainer of the voice search system, the acoustic model, comprising (i) retraining the acoustic model of the automated speech recognizer using a first portion of the audio that is associated with the one or more terms of the first transcription when the correction classifier indicates that the replacement terms likely represent a correction, or (ii) bypassing retraining of the acoustic model of the automated speech recognizer using the first portion of the first audio data that is associated with the one or more terms of the first transcription when the correction classifier indicates that the replacement terms do not likely represent a correction;
obtaining, by the automated speech recognizer of the voice search system and using the retrained acoustic model, a transcription of audio data corresponding to a subsequently received utterance; and
providing, by the user interface component of the voice search system, a user interface that includes one or more search results that the search engine of the voice search system has identified in response to the transcription of the audio data corresponding to the subsequently received utterance.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for speech recognition. One of the methods includes receiving first audio data corresponding to an utterance; obtaining a first transcription of the first audio data; receiving data indicating (i) a selection of one or more terms of the first transcription and (ii) one or more of replacement terms; determining that one or more of the replacement terms are classified as a correction of one or more of the selected terms; in response to determining that the one or more of the replacement terms are classified as a correction of the one or more of the selected terms, obtaining a first portion of the first audio data that corresponds to one or more terms of the first transcription; and using the first portion of the first audio data that is associated with the one or more terms of the first transcription to train an acoustic model for recognizing the one or more of the replacement terms.
17 Citations
18 Claims
-
1. A method comprising:
-
receiving, from a client device and by a voice search system that includes (i) an automated speech recognizer that uses an acoustic model to transcribe utterances, (ii) a search engine, (iii) an acoustic model trainer that periodically retrains the acoustic model using portions of audio data that correspond to manually specified terms of first transcriptions, (iv) a user interface component, and (v) a correction classifier, first audio data corresponding to an utterance of a user; obtaining, by the automated speech recognizer of the voice search system, a first transcription of the first audio data; receiving, by the user interface component of the voice search system, data indicating (i) a selection of one or more terms of the first transcription and (ii) one or more of replacement terms that the user has manually specified as a replacement for the one or more terms; determining, by the correction classifier of the voice search system, a minimum edit distance between the one or more terms of the first transcription and the one or more replacement terms; determining, by the correction classifier of the voice search system and based at least on the minimum edit distance between the one or more terms of the first transcription and the one or more replacement terms that the user has manually specified as a replacement for the one or more terms, whether one or more of the replacement terms that the user has manually specified as a replacement for the one or more terms likely represent a correction of one or more of the one or more terms of the first transcription; in response to determining, based at least on the minimum edit distance between the one or more terms of the first transcription and the one or more replacement terms that the user has manually specified as a replacement for the one or more terms, whether the one or more of the replacement terms that the user has manually specified as a replacement for the one or more terms likely represent a correction of the one or more terms of the first transcription, selectively retraining, by the acoustic model trainer of the voice search system, the acoustic model, comprising (i) retraining the acoustic model of the automated speech recognizer using a first portion of the audio that is associated with the one or more terms of the first transcription when the correction classifier indicates that the replacement terms likely represent a correction, or (ii) bypassing retraining of the acoustic model of the automated speech recognizer using the first portion of the first audio data that is associated with the one or more terms of the first transcription when the correction classifier indicates that the replacement terms do not likely represent a correction; obtaining, by the automated speech recognizer of the voice search system and using the retrained acoustic model, a transcription of audio data corresponding to a subsequently received utterance; and providing, by the user interface component of the voice search system, a user interface that includes one or more search results that the search engine of the voice search system has identified in response to the transcription of the audio data corresponding to the subsequently received utterance. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A voice search system including (i) an automated speech recognizer that uses an acoustic model to transcribe utterances, (ii) a search engine, (iii) an acoustic model trainer that periodically retrains the acoustic model using portions of audio data that correspond to manually specified terms of first transcriptions, (iv) a user interface component, and (v) a correction classifier, the voice search system comprising:
-
a processor configured to execute computer program instructions; and a computer storage medium encoded with the computer program instructions that, when executed by the processor, cause the system to perform operations comprising; receiving first audio data corresponding to an utterance of a user; receiving, from a client device, first audio data corresponding to an utterance of a user; obtaining, by the automated speech recognizer, a first transcription of the first audio data; receiving, by the user interface component, data indicating (i) a selection of one or more terms of the first transcription and (ii) one or more of replacement terms that the user has manually specified as a replacement for the one or more terms; determining, by the correction classifier, a minimum edit distance between the one or more terms of the first transcription and the one or more replacement terms; determining, by the correction classifier of the voice search system and based at least on the minimum edit distance between the one or more terms of the first transcription and the one or more replacement terms that the user has manually specified as a replacement for the one or more terms, whether one or more of the replacement terms that the user has manually specified as a replacement for the one or more terms likely represent a correction of one or more of the one or more terms of the first transcription; in response to determining, based at least on the minimum edit distance between the one or more terms of the first transcription and the one or more replacement terms that the user has manually specified as a replacement for the one or more terms, whether the one or more of the replacement terms that the user has manually specified as a replacement for the one or more terms likely represents a correction of the one or more terms of the first transcription, selectively retraining, by the acoustic model trainer of the voice search system, the acoustic model, comprising (i) retraining the acoustic model of the automated speech recognizer using a first portion of the audio that is associated with the one or more terms of the first transcription when the correction classifier indicates that the replacement terms likely represent a correction, or (ii) bypassing retraining of the acoustic model of the automated speech recognizer using the first portion of the first audio data that is associated with the one or more terms of the first transcription when the correction classifier indicates that the replacement terms do not likely represent a correction; obtaining, by the automated speech recognizer and using the retrained acoustic model, a transcription of audio data corresponding to a subsequently received utterance; and providing, by the user interface component, a user interface that includes one or more search results that the search engine of the voice search system has identified in response to the transcription of the audio data corresponding to the subsequently received utterance. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer-readable storage device encoded with a computer program, the computer program comprising instructions that, when executed by one or more computers, cause the one or more computers to perform operations comprising:
-
receiving, from a client device and by a voice search system that includes (i) an automated speech recognizer that uses an acoustic model to transcribe utterances, (ii) a search engine, (iii) an acoustic model trainer that periodically retrains the acoustic model using portions of audio data that correspond to manually specified terms of first transcriptions, (iv) a user interface component, and (v) a correction classifier, first audio data corresponding to an utterance of a user; obtaining, by the automated speech recognizer of the voice search system, a first transcription of the first audio data; receiving, by the user interface component of the voice search system, data indicating (i) a selection of one or more terms of the first transcription and (ii) one or more of replacement terms that the user has manually specified as a replacement for the one or more terms; determining, by the correction classifier of the voice search system, a minimum edit distance between the one or more terms of the first transcription and the one or more replacement terms; determining, by the correction classifier of the voice search system and based at least on the minimum edit distance between the one or more terms of the first transcription and the one or more replacement terms that the user has manually specified as a replacement for the one or more terms, whether one or more of the replacement terms that the user has manually specified as a replacement for the one or more terms likely represent a correction of one or more of the one or more terms of the first transcription; in response to determining, based at least on the minimum edit distance between the one or more terms of the first transcription and the one or more replacement terms that the user has manually specified as a replacement for the one or more terms, whether the one or more of the replacement terms that the user has manually specified as a replacement for the one or more terms likely represents a correction of the one or more terms of the first transcription, selectively retraining, by the acoustic model trainer of the voice search system, the acoustic model, comprising (i) retraining the acoustic model of the automated speech recognizer using a first portion of the audio that is associated with the one or more terms of the first transcription when the correction classifier indicates that the replacement terms likely represent a correction, or (ii) bypassing retraining of the acoustic model of the automated speech recognizer using the first portion of the first audio data that is associated with the one or more terms of the first transcription when the correction classifier indicates that the replacement terms do not likely represent a correction; obtaining, by the automated speech recognizer of the voice search system and using the retrained acoustic model, a transcription of audio data corresponding to a subsequently received utterance; and providing, by the user interface component of the voice search system, a user interface that includes one or more search results that the search engine of the voice search system has identified in response to the transcription of the audio data corresponding to the subsequently received utterance. - View Dependent Claims (16, 17, 18)
-
Specification