Cross-lingual initialization of language models

US 8,260,615 B1
Filed: 04/25/2011
Issued: 09/04/2012
Est. Priority Date: 04/25/2011
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method performed by at least one processor, the method comprising:

receiving audio input in a target language, and a target context of the audio input;

determining that a target corpus that corresponds to the target language and the target context is unavailable;

receiving logged speech recognition results that correspond to an existing corpus that is specific to a given language that differs from the target language, and to the same target context that corresponds to the received target language audio input and the logged speech recognition results;

generating the target corpus that corresponds to the target language and the target context by machine-translating the logged speech recognition results corresponding to the given language to the target language; and

estimating a context-specific language model that is specific to both the target language and the target context using the generated target corpus.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for initializing language models for automatic speech recognition. In one aspect, a method includes receiving logged speech recognition results from an existing corpus that is specific to a given language and a target context, generating a target corpus by machine-translating the logged speech recognition results from the given language to a different, target language, and estimating a language model that is specific to the different, target language and the same, target context, using the target corpus.

Citations

18 Claims

1. A computer-implemented method performed by at least one processor, the method comprising:
- receiving audio input in a target language, and a target context of the audio input;
  
  determining that a target corpus that corresponds to the target language and the target context is unavailable;
  
  receiving logged speech recognition results that correspond to an existing corpus that is specific to a given language that differs from the target language, and to the same target context that corresponds to the received target language audio input and the logged speech recognition results;
  
  generating the target corpus that corresponds to the target language and the target context by machine-translating the logged speech recognition results corresponding to the given language to the target language; and
  
  estimating a context-specific language model that is specific to both the target language and the target context using the generated target corpus.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein estimating the context-specific language model comprises counting each occurrence of each distinctive word or phrase in the target corpus.
  - 3. The method of claim 2, wherein estimating the context-specific language model comprises determining a relative frequency of occurrence of each distinctive word or phrase in the target corpusfrom among all distinctive words or phrases in the target corpus.
  - 4. The method of claim 1, wherein the target context is associated with a particular application or application state, operating system, geographic location or region, or environmental or ambient characteristic.
  - 5. The method of claim 1, wherein the target context is a text messaging context, an e-mail context, a search query context, a voice-dialing context, or a navigation context.
  - 6. The method of claim 1, wherein generating the target corpus comprises filtering the speech recognition results, then machine-translating only the filtered speech recognition results.
  - 7. The method of claim 6, wherein filtering the speech recognition results comprises filtering the speech recognition results that are associated with a speech recognition confidence score that is below a predefined threshold.
  - 8. The method of claim 6, wherein filtering the speech recognition results comprises filtering the speech recognition results that represent abbreviations.
  - 9. The method of claim 1, wherein generating the target corpus comprises machine-translating the speech recognition results of the existing corpus in real time as the speech recognition results are received.

10. A system comprising:
- one or more non-transitory computer-readable storage media storing data that represents an existing corpus;
  
  an automated speech recognition engine, executable on one or more processors having access to the computer-readable storage media, and operable to receive audio input in a target language, and a target context of the audio, and further operable to determine that a target corpus that corresponds to the target language and the target context is unavailable;
  
  a machine translation engine, executable on one or more processors having access to the computer-readable storage media, and operable to receive logged speech recognition results that correspond to an existing corpus that is specific for a given language that differs from the target language, and to the same target context that corresponds to the received target language audio input and the logged speech recognition results, and further operable to generate the target corpus that corresponds to the target language and the target context by machine-translating the logged speech recognition results corresponding to the given language to the target language, wherein results of the translation are stored in the computer-readable storage medium as the target corpus; and
  
  a language model generator, executable on one or more processors having access to the computer-readable storage media, and operable to estimate a context-specific language model that is specific to both the target language and the target context using the generated target corpus.
- View Dependent Claims (11, 12, 13)
- - 11. The system of claim 10, wherein the machine translation engine is further operable to translate logged text data of the existing corpus in the given language to the target language and include translation results of the logged text data in the target corpus.
  - 12. The system of claim 10, wherein estimating the context-specific language model comprises determining a relative frequency of occurrence of each distinctive word or phrase in the target corpus from among all distinctive words or phrases in the target corpus.
  - 13. The system of claim 10, wherein the target context is a text messaging context, an e-mail context, a search query context, a voice-dialing context, or a navigation context.

14. A non-transitory computer storage medium encoded with a computer program, the program comprising instructions that when executed by data processing apparatus cause the data processing apparatus to perform operations comprising:
- receiving audio input in a target language, and a target context of the audio input;
  
  determining that a target corpus that corresponds to the target language and the target context is unavailable;
  
  receiving logged speech recognition results that correspond to an existing corpus that is specific to a given language that differs from the target language, and to the same target context that corresponds to the received target language audio input and the logged speech recognition results;
  
  generating the target corpus that corresponds to the target language and the target context by machine-translating the logged speech recognition results corresponding to the given language to the target language; and
  
  estimating a context-specific language model that is specific to both the target language and the target context using the generated target corpus.
- View Dependent Claims (15, 16, 17, 18)
- - 15. The computer storage medium of claim 14, wherein generating the target corpus comprises filtering the speech recognition results, then machine-translating only the filtered speech recognition results.
  - 16. The computer storage medium of claim 15, wherein filtering the speech recognition results comprises filtering the speech recognition results that are associated with a speech recognition confidence score that is below a predefined threshold.
  - 17. The computer storage medium of claim 14, wherein the target context is associated with a particular application or application state, operating system, geographic location or region, or environmental or ambient characteristic.
  - 18. The computer storage medium of claim 14, wherein generating the target corpus further comprises including the machine-translated speech recognition results and an existing, partial corpus specific to the target language and the target context in the target corpus.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Strope, Brian, Nakajima, Kaisuke
Primary Examiner(s)
AZAD, ABUL K

Application Number

US13/093,176
Time in Patent Office

498 Days
Field of Search

704/257, 704/235, 704/277, 704 1- 10
US Class Current

704/257
CPC Class Codes

G06F 40/58   Use of machine translation,...

G10L 15/005   Language recognition

G10L 15/183   using context dependencies,...

G10L 15/197   Probabilistic grammars, e.g...

Cross-lingual initialization of language models

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Cross-lingual initialization of language models

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links