Disambiguation language model

US 20050171761A1
Filed: 03/29/2005
Published: 08/04/2005
Est. Priority Date: 01/31/2001
Status: Active Grant

First Claim

Patent Images

1. A computer readable medium including instructions readable by a computer which, when implemented, cause the computer to construct a language model for a speech recognition system to disambiguate characters of an Asian language, the method comprising constructing a training corpus comprising the steps of:

obtaining a dictionary of word phrases;

for each word phrase of the dictionary of word phrases comprising Kanji-based characters, associating a character of the word phrase and the word phrase with a context cue indicative of disambiguating the character to automatically generate context cue phrases of the training corpus; and

using the training corpus to build the language model.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A language model for a language processing system such as a speech recognition system is constructed from training corpus formed from associated characters, word phrases and context cues. A method and apparatus for generating the training corpus used to train the language model and a system or module using such a language model is disclosed.

Citations

20 Claims

1. A computer readable medium including instructions readable by a computer which, when implemented, cause the computer to construct a language model for a speech recognition system to disambiguate characters of an Asian language, the method comprising constructing a training corpus comprising the steps of:
- obtaining a dictionary of word phrases;
  
  for each word phrase of the dictionary of word phrases comprising Kanji-based characters, associating a character of the word phrase and the word phrase with a context cue indicative of disambiguating the character to automatically generate context cue phrases of the training corpus; and
  
  using the training corpus to build the language model.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The computer readable medium of claim 1, wherein the language model comprises an N-gram language model having probability information for the generated phrases.
  - 3. The computer readable medium of claim 1, wherein associating includes associating a first or last character of each word phrase with the word phrase.
  - 4. The method of claim 2, and further comprising adjusting a probability score for each of the associated characters and word phrases in the language model.
  - 5. The method of claim 1, wherein the context cue comprises in Chinese.
  - 6. The method of claim 1, wherein the context cue comprises in Japanese.
  - 7. The method of claim 1, wherein each character is a single Kanji-based character.

8. A computer readable medium including instructions which, when implemented, cause a computer to recognize Kanji-based characters when spoken, the instructions comprising:
- a first module adapted to construct a training corpus by performing the steps;
  
  obtaining a dictionary of word phrases;
  
  for each word phrase of the dictionary of word phrases comprising Kanji-based characters, associating a character of the word phrase and the word phrase with a context cue indicative of disambiguating the character to automatically generate context cue phrases of the training corpus; and
  
  a second module adapted to construct a language model using the training corpus.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The computer readable medium of claim 8, wherein associating includes associating a first or last character of each word phrase with the word phrase.
  - 10. The computer readable medium of claim 9, wherein the context cue comprises in Chinese.
  - 11. The method of claim 9, wherein the context cue comprises in Japanese.
  - 12. The computer readable medium of claim 8, and further comprising a third module adapted to recognize Kanji-based characters when spoken by performing the step of receiving input speech having a context cue phrase, the context cue phrase comprising a Kanji-based character, a word phrase having the Kanji-based character, and a context cue, wherein the context cue is indicative of disambiguating the Kanji-based character.
  - 13. The computer readable medium of claim 12, wherein the third module is further adapted to perform the steps of:
    - detecting the context cue phrase in the received input speech without prompting;
      
      executing instructions for accessing the language model, wherein the language model comprises a statistical language model having probability information for the context cue phrases; and
      
      outputting the character as text without the word phrase and the context cue for the detected context cue phrase.
  - 14. The computer readable medium of claim 13, wherein the statistical language model is an N-gram language model.

15. A method of recognizing Kanji-based characters when spoken, the method comprising the steps of:
- receiving input speech having a context cue phrase, the context cue phrase comprising a Kanji-based character, a word phrase having the Kanji-based character, and a context cue, wherein the context cue is indicative of disambiguating the Kanji-based character;
  
  detecting the context cue phrase in the received input speech without prompting;
  
  executing instructions for accessing a language model, wherein the language model comprises an N-gram language model having probability information for the context cue phrases; and
  
  outputting the character as text without the word phrase and the context cue for the detected context cue phrase.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer readable medium of claim 15, wherein outputting the character includes outputting the character string based on recognizing the character using the language model.
  - 17. The computer readable medium of claim 15, wherein the language model comprises a context-free grammar.
  - 18. The computer readable medium of claim 15, wherein outputting the character includes outputting the character based on a comparison of a recognized character with a recognized word phrase.
  - 19. The computer readable medium of claim 18, wherein when the recognized character is not present in the recognized word phrase, the character that is output is a character of the recognized word phrase.
  - 20. The computer readable medium of claim 15, wherein each of the word phrases is a single word, and wherein each of the characters is a single Kanji-based character.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Alleva, Fileno A., Ju, Yun-cheng

Granted Patent

US 7,251,600 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/10
CPC Class Codes

G10L 15/063 Training

G10L 15/18 using natural language mode...

Disambiguation language model

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Disambiguation language model

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links