Recognizing accented speech

US 10,347,239 B2
Filed: 03/21/2017
Issued: 07/09/2019
Est. Priority Date: 02/21/2013
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving, by an automated speech recognition system that is configured to perform speech recognition on received audio data using a selected linguistic library and one or more selected accent libraries, audio data of an utterance that was spoken while focus is set on a field of a form;

based at least on a field type associated with the field of the form, determining to select at least two accent libraries for the automated speech recognition system to use in combination with a linguistic library to perform speech recognition on the audio data of the utterance;

based on determining to select at least two accent libraries for the automated speech recognition system to use in combination with the linguistic library to perform speech recognition on the audio data of the utterance, selecting, from among multiple accent libraries, a first accent library and a second, different accent library;

obtaining a transcription of the utterance by performing speech recognition on the audio data of the utterance using the first accent library, the second, different accent library, and the linguistic library; and

providing, for output to the field of the form, the transcription of the utterance.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques (300, 400, 500) and apparatuses (100, 200, 700) for recognizing accented speech are described. In some embodiments, an accent module recognizes accented speech using an accent library based on device data, uses different speech recognition correction levels based on an application field into which recognized words are set to be provided, or updates an accent library based on corrections made to incorrectly recognized speech.

Citations

21 Claims

1. A computer-implemented method comprising:
- receiving, by an automated speech recognition system that is configured to perform speech recognition on received audio data using a selected linguistic library and one or more selected accent libraries, audio data of an utterance that was spoken while focus is set on a field of a form;
  
  based at least on a field type associated with the field of the form, determining to select at least two accent libraries for the automated speech recognition system to use in combination with a linguistic library to perform speech recognition on the audio data of the utterance;
  
  based on determining to select at least two accent libraries for the automated speech recognition system to use in combination with the linguistic library to perform speech recognition on the audio data of the utterance, selecting, from among multiple accent libraries, a first accent library and a second, different accent library;
  
  obtaining a transcription of the utterance by performing speech recognition on the audio data of the utterance using the first accent library, the second, different accent library, and the linguistic library; and
  
  providing, for output to the field of the form, the transcription of the utterance.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the first accent library and the second, different accent library are selected based on demographic data of the user.
  - 3. The method of claim 2, wherein the demographic data of the user includes an age range, gender, native language, and a geographic location where the user is located.
  - 4. The method of claim 2, the demographic data of the user is based on countries of addresses stored in an address book of a computing device that receives the utterance.
  - 5. The method of claim 1, comprising:
    - receiving by the automated speech recognition system, additional audio data of an additional utterance that was spoken while focus is set on a different field of the form;
      
      based at least on a field type associated with the different field of the form, determining to select at least three accent libraries for the automated speech recognition system to use in combination with the linguistic library to perform speech recognition on the additional audio data of the additional utterance;
      
      based on determining to select at least three accent libraries for the automated speech recognition system to use in combination with the linguistic library to perform speech recognition on the additional audio data of the additional utterance, selecting, from among the multiple accent libraries, the first accent library, the second, different accent library, and a third, different accent library;
      
      obtaining an additional transcription of the additional utterance by performing speech recognition on the additional audio data of the additional utterance using the first accent library, the second, different accent library, the third, different accent library, and the linguistic library; and
      
      providing, for output to the additional field of the form, the additional transcription of the additional utterance.
  - 6. The method of claim 1, wherein:
    - the form is an email form,the field is an email body field, a to field, a cc field, or a subject field, andthe field type is a general text field or an address field.
  - 7. The method of claim 1, wherein increasing a quantity of accent libraries used by the automated speech recognition system increases an accuracy level of speech recognition.
  - 8. The method of claim 1, wherein:
    - the linguistic library includes words of a language, andthe first accent library and the second, different library each include phonemes for different pronunciations for the words of the language.

9. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving, by an automated speech recognition system that is configured to perform speech recognition on received audio data using a selected linguistic library and one or more selected accent libraries, audio data of an utterance that was spoken while focus is set on a field of a form;
  
  based at least on a field type associated with the field of the form, determining to select at least two accent libraries for the automated speech recognition system to use in combination with a linguistic library to perform speech recognition on the audio data of the utterance;
  
  based on determining to select at least two accent libraries for the automated speech recognition system to use in combination with the linguistic library to perform speech recognition on the audio data of the utterance, selecting, from among multiple accent libraries, a first accent library and a second, different accent library;
  
  obtaining a transcription of the utterance by performing speech recognition on the audio data of the utterance using the first accent library, the second, different accent library, and the linguistic library; and
  
  providing, for output to the field of the form, the transcription of the utterance.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The system of claim 9, wherein the first accent library and the second, different accent library are selected based on demographic data of the user.
  - 11. The system of claim 10, wherein the demographic data of the user includes an age range, gender, native language, and a geographic location where the user is located.
  - 12. The system of claim 10, the demographic data of the user is based on countries of addresses stored in an address book of a computing device that receives the utterance.
  - 13. The system of claim 9, wherein the operations further comprise:
    - receiving, by the automated speech recognition system, additional audio data of an additional utterance that was spoken while focus is set on a different field of the form;
      
      based at least on a field type associated with the different field of the form, determining to select at least three accent libraries for the automated speech recognition system to use in combination with the linguistic library to perform speech recognition on the additional audio data of the additional utterance;
      
      based on determining to select at least three accent libraries for the automated speech recognition system to use in combination with the linguistic library to perform speech recognition on the additional audio data of the additional utterance, selecting, from among the multiple accent libraries, the first accent library, the second, different accent library, and a third, different accent library;
      
      obtaining an additional transcription of the additional utterance by performing speech recognition on the additional audio data of the additional utterance using the first accent library, the second, different accent library, the third, different accent library, and the linguistic library; and
      
      providing, for output to the additional field of the form, the additional transcription of the additional utterance.
  - 14. The system of claim 9, wherein:
    - the form is an email form,the field is an email body field, a to field, a cc field, or a subject field, andthe field type is a general text field or an address field.
  - 15. The system of claim 9, wherein increasing a quantity of accent libraries used by the automated speech recognition system increases an accuracy level of speech recognition.

16. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- receiving, by an automated speech recognition system that is configured to perform speech recognition on received audio data using a selected linguistic library and one or more selected accent libraries, audio data of an utterance that was spoken while focus is set on a field of a form;
  
  based at least on a field type associated with the field of the form, determining to select at least two accent libraries for the automated speech recognition system to use in combination with a linguistic library to perform speech recognition on the audio data of the utterance;
  
  based on determining to select at least two accent libraries for the automated speech recognition system to use in combination with the linguistic library to perform speech recognition on the audio data of the utterance, selecting, from among multiple accent libraries, a first accent library and a second, different accent library;
  
  obtaining a transcription of the utterance by performing speech recognition on the audio data of the utterance using the first accent library, the second, different accent library, and the linguistic library; and
  
  providing, for output to the field of the form, the transcription of the utterance.
- View Dependent Claims (17, 18, 19, 20, 21)
- - 17. The medium of claim 16, wherein the first accent library and the second, different accent library are selected based on demographic data of the user.
  - 18. The medium of claim 17, the demographic data of the user is based on countries of addresses stored in an address book of a computing device that receives the utterance.
  - 19. The medium of claim 16, wherein the operations further comprise:
    - receiving, by the automated speech recognition system, additional audio data of an additional utterance that was spoken while focus is set on a different field of the form;
      
      based at least on a field type associated with the different field of the form, determining to select at least three accent libraries for the automated speech recognition system to use in combination with the linguistic library to perform speech recognition on the additional audio data of the additional utterance;
      
      based on determining to select at least three accent libraries for the automated speech recognition system to use in combination with the linguistic library to perform speech recognition on the additional audio data of the additional utterance, selecting, from among the multiple accent libraries, the first accent library, the second, different accent library, and a third, different accent library;
      
      obtaining an additional transcription of the additional utterance by performing speech recognition on the additional audio data of the additional utterance using the first accent library, the second, different accent library, the third, different accent library, and the linguistic library; and
      
      providing, for output to the additional field of the form, the additional transcription of the additional utterance.
  - 20. The medium of claim 16, wherein:
    - the form is an email form,the field is an email body field, a to field, a cc field, or a subject field, andthe field type is a general text field or an address field.
  - 21. The medium of claim 16, wherein increasing a quantity of accent libraries used by the automated speech recognition system increases an accuracy level of speech recognition.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Technology Holdings LLC (Alphabet Inc.)
Inventors
Gray, Kristin A.
Primary Examiner(s)
Ky, Kevin

Application Number

US15/465,345
Publication Number

US 20170193990A1
Time in Patent Office

840 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 40/174   Form filling; Merging

G10L 15/00   Speech recognition G10L17/0...

G10L 15/005   Language recognition

G10L 15/01   Assessment or evaluation of...

G10L 15/063   Training

G10L 15/1807   using prosody or stress

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 2015/0635   updating or merging of old ...

G10L 2015/227   of the speaker; Human-fact...

Recognizing accented speech

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Recognizing accented speech

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links