Recognizing accented speech
First Claim
Patent Images
1. A computer-implemented method comprising:
- receiving, at a client device and from a server-based automated speech recognizer that has access to multiple accent libraries, a transcription of an utterance received at the client device;
providing, for output on a display of the client device, the transcription of the utterance received at the client device;
receiving, at the client device and from a user, an additional transcription and data indicating that the additional transcription is a correction to the transcription of the utterance that was incorrectly recognized by the server-based automated speech recognizer that has access to the multiple accent libraries;
in response to receiving the additional transcription and the data indicating that the additional transcription is a correction to the transcription of the utterance, accessing data stored on the client device;
based on the data stored on the client device, identifying, by the client device, an accent library of the multiple accent libraries to be updated using the additional transcription;
transmitting, by the client device and to the server-based automated speech recognizer, a request to update the accent library of the multiple accent libraries based at least on the additional transcription and the data indicating that the additional transcription is a correction to the transcription of the utterance;
receiving, by the client device, a respective update for the accent library of the multiple accent libraries; and
obtaining a transcription of a subsequently received utterance by the server-based automated speech recognizer that has access to the multiple accent libraries including the updated accent library.
3 Assignments
0 Petitions
Accused Products
Abstract
Techniques (300, 400, 500) and apparatuses (100, 200, 700) for recognizing accented speech are described. In some embodiments, an accent module recognizes accented speech using an accent library based on device data, uses different speech recognition correction levels based on an application field into which recognized words are set to be provided, or updates an accent library based on corrections made to incorrectly recognized speech.
-
Citations
20 Claims
-
1. A computer-implemented method comprising:
-
receiving, at a client device and from a server-based automated speech recognizer that has access to multiple accent libraries, a transcription of an utterance received at the client device; providing, for output on a display of the client device, the transcription of the utterance received at the client device; receiving, at the client device and from a user, an additional transcription and data indicating that the additional transcription is a correction to the transcription of the utterance that was incorrectly recognized by the server-based automated speech recognizer that has access to the multiple accent libraries; in response to receiving the additional transcription and the data indicating that the additional transcription is a correction to the transcription of the utterance, accessing data stored on the client device; based on the data stored on the client device, identifying, by the client device, an accent library of the multiple accent libraries to be updated using the additional transcription; transmitting, by the client device and to the server-based automated speech recognizer, a request to update the accent library of the multiple accent libraries based at least on the additional transcription and the data indicating that the additional transcription is a correction to the transcription of the utterance; receiving, by the client device, a respective update for the accent library of the multiple accent libraries; and obtaining a transcription of a subsequently received utterance by the server-based automated speech recognizer that has access to the multiple accent libraries including the updated accent library. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; receiving, at a client device and from a server-based automated speech recognizer that has access to multiple accent libraries, a transcription of an utterance received at the client device; providing, for output on a display of the client device, the transcription of the utterance received at the client device; receiving, at the client device and from a user, an additional transcription and data indicating that the additional transcription is a correction to the transcription of the utterance that was incorrectly recognized by the server-based automated speech recognizer that has access to the multiple accent libraries; in response to receiving the additional transcription and the data indicating that the additional transcription is a correction to the transcription of the utterance, accessing data stored on the client device; based on the data stored on the client device, identifying, by the client device, an accent library of the multiple accent libraries to be updated using the additional transcription; transmitting, by the client device and to the server-based automated speech recognizer, a request to update the accent library of the multiple accent libraries based at least on the additional transcription and the data indicating that the additional transcription is a correction to the transcription of the utterance; receiving, by the client device, a respective update for the accent library of the multiple accent libraries; and obtaining a transcription of a subsequently received utterance by the server-based automated speech recognizer that has access to the multiple accent libraries including the updated accent library. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
receiving, at a client device and from a server-based automated speech recognizer that has access to multiple accent libraries, a transcription of an utterance received at the client device; providing, for output on a display of the client device, the transcription of the utterance received at the client device; receiving, at the client device and from a user, an additional transcription and data indicating that the additional transcription is a correction to the transcription of the utterance that was incorrectly recognized by the server-based automated speech recognizer that has access to the multiple accent libraries; in response to receiving the additional transcription and the data indicating that the additional transcription is a correction to the transcription of the utterance, accessing data stored on the client device; based on the data stored on the client device, identifying, by the client device, an accent library of the multiple accent libraries to be updated using the additional transcription; transmitting, by the client device and to the server-based automated speech recognizer, a request to update the accent library of the multiple accent libraries based at least on the additional transcription and the data indicating that the additional transcription is a correction to the transcription of the utterance; receiving, by the client device, a respective update for the accent library of the multiple accent libraries; and obtaining a transcription of a subsequently received utterance by the server-based automated speech recognizer that has access to the multiple accent libraries including the updated accent library. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification