Sub-lexical language models with word level pronunciation lexicons
First Claim
Patent Images
1. A method performed by a data processing apparatus, the method comprising:
- accessing a word level pronunciation lexicon and a word level training text corpus for a natural language;
segmenting, using a word decomposition system, the word level training text corpus into sub-lexical units;
training an n-gram language model over the sub-lexical units to produce a sub-lexical language model;
constructing, using the word decomposition system, a word to sub-lexical unit mapping transducer;
constructing a word level language model by;
obtaining a result of composing the mapping transducer with the sub-lexical language model, andperforming a projection on the result of the composition of the mapping transducer and the sub-lexical language model;
constructing a speech decoding network at least by composing a context dependency model with the word level pronunciation lexicon and with the word level language model;
receiving an audio stream from a user; and
recognizing the audio stream, using the speech decoding network.
2 Assignments
0 Petitions
Accused Products
Abstract
An automatic speech recognition (ASR) system and method are provided for using sub-lexical language models together with word level pronunciation lexicons. These approaches operate by introducing a transduction between sequences of sub-lexical units and sequences of words.
190 Citations
23 Claims
-
1. A method performed by a data processing apparatus, the method comprising:
-
accessing a word level pronunciation lexicon and a word level training text corpus for a natural language; segmenting, using a word decomposition system, the word level training text corpus into sub-lexical units; training an n-gram language model over the sub-lexical units to produce a sub-lexical language model; constructing, using the word decomposition system, a word to sub-lexical unit mapping transducer; constructing a word level language model by; obtaining a result of composing the mapping transducer with the sub-lexical language model, and performing a projection on the result of the composition of the mapping transducer and the sub-lexical language model; constructing a speech decoding network at least by composing a context dependency model with the word level pronunciation lexicon and with the word level language model; receiving an audio stream from a user; and recognizing the audio stream, using the speech decoding network. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; accessing a word level pronunciation lexicon and a word level training text corpus for a natural language; segmenting, using a word decomposition system, the word level training text corpus into sub-lexical units; training an n-gram language model over the sub-lexical units to produce a sub-lexical language model; constructing, using the word decomposition system, a word to sub-lexical unit mapping transducer; constructing a word level language model by; obtaining a result of composing the mapping transducer with the sub-lexical language model, and performing a projection on the result of the composition of the mapping transducer and the sub-lexical language model; constructing a speech decoding network at least by composing a context dependency model with the word level pronunciation lexicon and with the word level language model; receiving an audio stream from a user; and recognizing the audio stream, using the speech decoding network. - View Dependent Claims (12, 13, 14, 15)
-
16. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
accessing a word level pronunciation lexicon and a word level training text corpus for a natural language; segmenting, using a word decomposition system, the word level training text corpus into sub-lexical units; training an n-gram language model over the sub-lexical units to produce a sub-lexical language model; constructing, using the word decomposition system, a word to sub-lexical unit mapping transducer; constructing a word level language model by; obtaining a result of composing the mapping transducer with the sub-lexical language model, and performing a projection on the result of the composition of the mapping transducer and the sub-lexical language model; constructing a speech decoding network at least by composing a context dependency model with the word level pronunciation lexicon and with the word level language model; receiving an audio stream from a user; and recognizing the audio stream, using the speech decoding network. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23)
-
Specification