Recognizing accented speech
First Claim
Patent Images
1. A computer-implemented method comprising:
- receiving, by an automated speech recognition system that is configured to perform speech recognition on received audio data using a selected linguistic library and one or more selected accent libraries, audio data of an utterance that was spoken while focus is set on a field of a form;
based at least on a field type associated with the field of the form, determining to select at least two accent libraries for the automated speech recognition system to use in combination with a linguistic library to perform speech recognition on the audio data of the utterance;
based on determining to select at least two accent libraries for the automated speech recognition system to use in combination with the linguistic library to perform speech recognition on the audio data of the utterance, selecting, from among multiple accent libraries, a first accent library and a second, different accent library;
obtaining a transcription of the utterance by performing speech recognition on the audio data of the utterance using the first accent library, the second, different accent library, and the linguistic library; and
providing, for output to the field of the form, the transcription of the utterance.
3 Assignments
0 Petitions
Accused Products
Abstract
Techniques (300, 400, 500) and apparatuses (100, 200, 700) for recognizing accented speech are described. In some embodiments, an accent module recognizes accented speech using an accent library based on device data, uses different speech recognition correction levels based on an application field into which recognized words are set to be provided, or updates an accent library based on corrections made to incorrectly recognized speech.
-
Citations
21 Claims
-
1. A computer-implemented method comprising:
-
receiving, by an automated speech recognition system that is configured to perform speech recognition on received audio data using a selected linguistic library and one or more selected accent libraries, audio data of an utterance that was spoken while focus is set on a field of a form; based at least on a field type associated with the field of the form, determining to select at least two accent libraries for the automated speech recognition system to use in combination with a linguistic library to perform speech recognition on the audio data of the utterance; based on determining to select at least two accent libraries for the automated speech recognition system to use in combination with the linguistic library to perform speech recognition on the audio data of the utterance, selecting, from among multiple accent libraries, a first accent library and a second, different accent library; obtaining a transcription of the utterance by performing speech recognition on the audio data of the utterance using the first accent library, the second, different accent library, and the linguistic library; and providing, for output to the field of the form, the transcription of the utterance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; receiving, by an automated speech recognition system that is configured to perform speech recognition on received audio data using a selected linguistic library and one or more selected accent libraries, audio data of an utterance that was spoken while focus is set on a field of a form; based at least on a field type associated with the field of the form, determining to select at least two accent libraries for the automated speech recognition system to use in combination with a linguistic library to perform speech recognition on the audio data of the utterance; based on determining to select at least two accent libraries for the automated speech recognition system to use in combination with the linguistic library to perform speech recognition on the audio data of the utterance, selecting, from among multiple accent libraries, a first accent library and a second, different accent library; obtaining a transcription of the utterance by performing speech recognition on the audio data of the utterance using the first accent library, the second, different accent library, and the linguistic library; and providing, for output to the field of the form, the transcription of the utterance. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
16. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
receiving, by an automated speech recognition system that is configured to perform speech recognition on received audio data using a selected linguistic library and one or more selected accent libraries, audio data of an utterance that was spoken while focus is set on a field of a form; based at least on a field type associated with the field of the form, determining to select at least two accent libraries for the automated speech recognition system to use in combination with a linguistic library to perform speech recognition on the audio data of the utterance; based on determining to select at least two accent libraries for the automated speech recognition system to use in combination with the linguistic library to perform speech recognition on the audio data of the utterance, selecting, from among multiple accent libraries, a first accent library and a second, different accent library; obtaining a transcription of the utterance by performing speech recognition on the audio data of the utterance using the first accent library, the second, different accent library, and the linguistic library; and providing, for output to the field of the form, the transcription of the utterance. - View Dependent Claims (17, 18, 19, 20, 21)
-
Specification