Disambiguation language model
First Claim
Patent Images
1. A computer readable medium including instructions readable by a computer which, when implemented, cause the computer to construct a language model for a speech recognition system to disambiguate characters of an Asian language, the method comprising constructing a training corpus comprising the steps of:
- obtaining a dictionary of word phrases;
for each word phrase of the dictionary of word phrases comprising Kanji-based characters, associating a character of the word phrase and the word phrase with a context cue indicative of disambiguating the character to automatically generate context cue phrases of the training corpus; and
using the training corpus to build the language model.
1 Assignment
0 Petitions
Accused Products
Abstract
A language model for a language processing system such as a speech recognition system is constructed from training corpus formed from associated characters, word phrases and context cues. A method and apparatus for generating the training corpus used to train the language model and a system or module using such a language model is disclosed.
-
Citations
20 Claims
-
1. A computer readable medium including instructions readable by a computer which, when implemented, cause the computer to construct a language model for a speech recognition system to disambiguate characters of an Asian language, the method comprising constructing a training corpus comprising the steps of:
-
obtaining a dictionary of word phrases;
for each word phrase of the dictionary of word phrases comprising Kanji-based characters, associating a character of the word phrase and the word phrase with a context cue indicative of disambiguating the character to automatically generate context cue phrases of the training corpus; and
using the training corpus to build the language model. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer readable medium including instructions which, when implemented, cause a computer to recognize Kanji-based characters when spoken, the instructions comprising:
-
a first module adapted to construct a training corpus by performing the steps;
obtaining a dictionary of word phrases;
for each word phrase of the dictionary of word phrases comprising Kanji-based characters, associating a character of the word phrase and the word phrase with a context cue indicative of disambiguating the character to automatically generate context cue phrases of the training corpus; and
a second module adapted to construct a language model using the training corpus. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A method of recognizing Kanji-based characters when spoken, the method comprising the steps of:
-
receiving input speech having a context cue phrase, the context cue phrase comprising a Kanji-based character, a word phrase having the Kanji-based character, and a context cue, wherein the context cue is indicative of disambiguating the Kanji-based character;
detecting the context cue phrase in the received input speech without prompting;
executing instructions for accessing a language model, wherein the language model comprises an N-gram language model having probability information for the context cue phrases; and
outputting the character as text without the word phrase and the context cue for the detected context cue phrase. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification