Recognizing different versions of a language

US 9,275,635 B1
Filed: 11/09/2012
Issued: 03/01/2016
Est. Priority Date: 03/08/2012
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving audio data that encodes an utterance;

providing the audio data to multiple speech recognizers that are each trained on a different dialect or accent of a same language;

receiving, from each of the multiple speech recognizers that are each trained on a different dialect or accent of a same language, (i) a transcription of the utterance, and (ii) a speech recognition confidence score;

selecting, from among the transcriptions of the utterance that are received from the multiple speech recognizers, one or more particular transcriptions that were each generated by two or more of the multiple speech recognizers;

selecting, from among the one or more particular transcriptions that were each generated by two or more of the multiple speech recognizers, a representative transcription based at least on the speech recognition confidence scores associated with the particular transcriptions; and

providing the representative transcription for output.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Speech recognition systems may perform the following operations: receiving audio at a computing device; identifying a language associated with the audio; recognizing the audio using recognition models for different versions of the language to produce recognition candidates for the audio, where the recognition candidates are associated with corresponding information; comparing the information of the recognition candidates to identify agreement between at least two of the recognition models; selecting a recognition candidate based on information of the recognition candidate and agreement between the at least two of the recognition models; and outputting data corresponding to the selected recognition candidate as a recognized version of the audio.

61 Citations

View as Search Results

20 Claims

1. A computer-implemented method comprising:
- receiving audio data that encodes an utterance;
  
  providing the audio data to multiple speech recognizers that are each trained on a different dialect or accent of a same language;
  
  receiving, from each of the multiple speech recognizers that are each trained on a different dialect or accent of a same language, (i) a transcription of the utterance, and (ii) a speech recognition confidence score;
  
  selecting, from among the transcriptions of the utterance that are received from the multiple speech recognizers, one or more particular transcriptions that were each generated by two or more of the multiple speech recognizers;
  
  selecting, from among the one or more particular transcriptions that were each generated by two or more of the multiple speech recognizers, a representative transcription based at least on the speech recognition confidence scores associated with the particular transcriptions; and
  
  providing the representative transcription for output.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, comprising:
    - identifying a language associated with the utterance prior to, or concurrent with, providing the audio data to the multiple speech recognizers that are each trained on a different dialect or accent of a same language.
  - 3. The method of claim 1, comprising:
    - selecting the multiple speech recognizers that are each trained on a different dialect or accent of a same language.
  - 4. The method of claim 3, wherein selecting the multiple speech recognizers that are each trained on a different dialect or accent of a same language comprises:
    - identifying all available speech recognizers that are trained on a dialect or accent of the language; and
      
      selecting all of the available speech recognizers that are trained on a dialect or accent of the language.
  - 5. The method of claim of claim 3, comprising selecting the multiple speech recognizers that are each trained on a different dialect or accent of a same language based on input from a user.
  - 6. The method of claim 3, wherein selecting the multiple speech recognizers that are each trained on a different dialect or accent of a same language comprises:
    - identifying a language associated with the utterance based on previously received audio data; and
      
      selecting the multiple speech recognizers that are each trained on a different dialect or accent of the identified language.
  - 7. The method of claim 3, wherein selecting the multiple speech recognizers that are each trained on a different dialect or accent of a same language based on input from a user comprises:
    - identifying a language associated with the utterance based on previously received audio data;
      
      providing, for display at a user interface, information indicating the identified language;
      
      receiving data indicating one or more selections corresponding to one or more dialects or accents of the identified language, wherein the selections are made from the user interface; and
      
      selecting multiple speech recognizers that are each trained on one of the selected dialects or accents of the identified language.
  - 8. The method of claim 1, wherein selecting the representative transcription based at least on the speech recognition confidence scores associated with the particular transcriptions comprises:
    - determining, for each of the two or more of the multiple speech recognizers that generated the particular transcription selected as the representative transcription, that the speech recognition confidence score associated with the particular transcription and received from the speech recognizer is a highest speech recognition confidence score among all speech recognition confidence scores associated with transcriptions of the utterance generated by the speech recognizer.
  - 9. The method of claim 1, wherein the multiple speech recognizers that are each trained on a different dialect or accent of a same language are constituents of a single, composite speech recognizer for a language.

10. One or more non-transitory machine-readable media storing instructions that are executable to perform operations comprising:
- receiving audio data that encodes an utterance;
  
  providing the audio data to multiple speech recognizers that are each trained on a different dialect or accent of a same language;
  
  receiving, from each of the multiple speech recognizers that are each trained on a different dialect or accent of a same language, (i) a transcription of the utterance, and (ii) a speech recognition confidence score;
  
  selecting, from among the transcriptions of the utterance that are received from the multiple speech recognizers, one or more particular transcriptions that were each generated by two or more of the multiple speech recognizers;
  
  selecting, from among the one or more particular transcriptions that were each generated by two or more of the multiple speech recognizers, a representative transcription based at least on the speech recognition confidence scores associated with the particular transcriptions; and
  
  providing the representative transcription for output.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The non-transitory machine-readable media of claim 10, wherein the operations comprise:
    - identifying a language associated with the utterance prior to, or concurrent with, providing the audio data to the multiple speech recognizers that are each trained on a different dialect or accent of a same language.
  - 12. The non-transitory machine-readable media of claim 10, wherein the operations comprise:
    - selecting the multiple speech recognizers that are each trained on a different dialect or accent of a same language.
  - 13. The non-transitory machine-readable media of claim 12, wherein selecting the multiple speech recognizers that are each trained on a different dialect or accent of a same language comprises:
    - identifying all available speech recognizers that are trained on a dialect or accent of the language; and
      
      selecting all of the available speech recognizers that are trained on a dialect or accent of the language.
  - 14. The non-transitory machine-readable media of claim 12, wherein the operations comprise selecting the multiple speech recognizers that are each trained on a different dialect or accent of a same language based on input from a user.
  - 15. The non-transitory machine-readable media of claim 12, wherein selecting the multiple speech recognizers that are each trained on a different dialect or accent of a same language comprises:
    - identifying a language associated with the utterance based on previously received audio data; and
      
      selecting the multiple speech recognizers that are each trained on a different dialect or accent of the identified language.
  - 16. The non-transitory machine-readable media of claim 12, wherein selecting the multiple speech recognizers that are each trained on a different dialect or accent of a same language based on input from a user comprises:
    - identifying a language associated with the utterance based on previously received audio data;
      
      providing, for display at a user interface, information indicating the identified language;
      
      receiving data indicating one or more selections corresponding to one or more dialects or accents of the identified language, wherein the selections are made from the user interface; and
      
      selecting the multiple speech recognizers that are each trained on one of the selected dialects or accents of the identified language.
  - 17. The non-transitory machine-readable media of claim 10, wherein selecting the representative transcription based at least on the speech recognition confidence scores associated with the particular transcriptions comprises:
    - determining, for each of the two or more of the multiple speech recognizers that generated the particular transcription selected as the representative transcription, that the speech recognition confidence score associated with the particular transcription and received from the speech recognizer is a highest speech recognition confidence score among all speech recognition confidence scores associated with transcriptions of the utterance generated by the speech recognizer.
  - 18. The non-transitory machine-readable media of claim 10, wherein the multiple speech recognizers that are each trained on a different dialect or accent of a same language are constituents of a single, composite speech recognizer for a language.

19. A system comprising:
- one or more processing apparatus;
  
  a non-transitory computer-readable storage device having stored thereon instructions that, when executed by the one or more data processing apparatus, cause the one or more data processing apparatus to performing operations comprising;
  
  receiving audio data that encodes an utterance;
  
  providing the audio data to multiple speech recognizers that are each trained on a different dialect or accent of a same language;
  
  receiving, from each of the multiple speech recognizers that are each trained on a different dialect or accent of a same language, (i) a transcription of the utterance, and (ii) a speech recognition confidence score;
  
  selecting, from among the transcriptions of the utterance that are received from the multiple speech recognizers, one or more particular transcriptions that were each generated by two or more of the multiple speech recognizers;
  
  selecting, from among the one or more particular transcriptions that were each generated by two or more of the multiple speech recognizers, a representative transcription based at least on the speech recognition confidence scores associated with the particular transcriptions; and
  
  providing the representative transcription for output.
- View Dependent Claims (20)
- - 20. The system of claim 19, wherein the operations comprise:
    - selecting the multiple speech recognizers that are each trained on a different dialect or accent of a same language.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Beaufays, Francoise, Strope, Brian, Sung, Yun-hsuan
Primary Examiner(s)
Han, Qi

Application Number

US13/672,945
Time in Patent Office

1,208 Days
Field of Search

704/200, 704/231, 704/246, 704/251, 704/255, 704/257
US Class Current

1/1
CPC Class Codes

G10L 15/183 using context dependencies,...

G10L 15/32 Multiple recognisers used i...

Recognizing different versions of a language

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

61 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Recognizing different versions of a language

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

61 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others