Method of multilingual speech recognition by reduction to single-language recognizer engine components
First Claim
Patent Images
1. A method of speech recognition comprising:
- providing a multilingual dispatcher engine in a computing system having at least a processor, a memory, text and audio input devices and a text input device;
providing individual components of language-specific single-language speech recognizers in the computing system; and
integrating the multilingual dispatcher engine in the computing system with language-independent functions and executing end-of-utterance detection and coordinating execution of the single-language recognizer components, the single-language recognizer components including at leasta first component executed by the computing system to produce one or more language-specific word models for any given word in a recognition vocabulary,a second component executed by the computing system to compute a language-specific speech feature vector from a slice of acoustic data,a third component executed by the computing system to update a state of a word hypothesis by the latest computed feature vector, a language-specific pronunciation, and a facility to initialize a state,and a fourth component executed by the computing system to update a state of a word hypothesis by an exit state of any state of any word hypothesis as immediately preceding segment of an utterance, the fourth component referred to as seeding, the seeding comprising computing the entry state of a pronunciation'"'"'s model by iterating over the exit states supplied to the seeding component,wherein coordination of execution comprises at least dispatching a step of recognition of a slice of speech, obtaining the scores of word hypotheses, cross-language seeding of word hypothesis and recovering the winning multilingual text from the winning word hypothesis at end of utterance.
0 Assignments
0 Petitions
Accused Products
Abstract
In some speech recognition applications, not only the language of the utterance is not known in advance, but also a single utterance may contain words in more than one language. At the same time, it is impractical to build speech recognizers for all expected combinations of languages. Moreover, business needs may require a new combination of languages to be supported in short period of time. The invention addresses this issue by a novel way of combining and controlling the components of the single-language speech recognizers to produce multilingual speech recognition functionality capable of recognizing multilingual utterances at a modest increase of computational complexity.
-
Citations
14 Claims
-
1. A method of speech recognition comprising:
-
providing a multilingual dispatcher engine in a computing system having at least a processor, a memory, text and audio input devices and a text input device; providing individual components of language-specific single-language speech recognizers in the computing system; and integrating the multilingual dispatcher engine in the computing system with language-independent functions and executing end-of-utterance detection and coordinating execution of the single-language recognizer components, the single-language recognizer components including at least a first component executed by the computing system to produce one or more language-specific word models for any given word in a recognition vocabulary, a second component executed by the computing system to compute a language-specific speech feature vector from a slice of acoustic data, a third component executed by the computing system to update a state of a word hypothesis by the latest computed feature vector, a language-specific pronunciation, and a facility to initialize a state, and a fourth component executed by the computing system to update a state of a word hypothesis by an exit state of any state of any word hypothesis as immediately preceding segment of an utterance, the fourth component referred to as seeding, the seeding comprising computing the entry state of a pronunciation'"'"'s model by iterating over the exit states supplied to the seeding component, wherein coordination of execution comprises at least dispatching a step of recognition of a slice of speech, obtaining the scores of word hypotheses, cross-language seeding of word hypothesis and recovering the winning multilingual text from the winning word hypothesis at end of utterance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method of speech recognition comprising:
-
in a memory of a computing system having a processor, text/audio input devices and a text output device, providing a multilingual dispatcher component and individual components of language-specific single-language speech recognizers in a memory executed by the computing system, the multilingual dispatcher component integrating language-independent functions and executing in parallel an end-of-utterance detection process and speech recognition process, the speech recognition process comprising; processing a slice of acoustic data into a language-independent feature vector in the multilingual dispatcher component, dispatching the feature vector to each active single-language recognizer and receiving the language-specific feature vector from each recognizer; dispatching in the multilingual dispatcher component the language-specific feature vectors each to its corresponding single-language recognizer to execute state update of word hypotheses by the corresponding recognizer; obtaining in the multilingual dispatcher component word hypotheses numeric scores from each single-language recognizer and scaling them to an internal scale of the multilingual dispatcher component; determining in the multilingual dispatcher component of the computing system a resulting numeric score of each word hypothesis as a least scaled score of a hypothesis among all single-language recognizers and, for the winning word hypothesis of each word, remembering the language that produced it; dispatching in the multilingual dispatcher component the execution of updating each word hypothesis using all winning word hypotheses within, the updating referred to as seeding, and using each of the single-language recognizer components, wherein; if the seeding state with the winning multilingual dispatcher component score belongs to the same language as the language of the recognizer then the built-in update facility of the recognizer is executed without modifications on the pair of the seeding and the seeded word hypotheses; if the seeding state belongs to a different language, applying language transition adjustments to the seeding score in the multilingual dispatcher component format which is then resealed to the scale of the single-language recognizer and replaces the seeding score, after which replacement the single-language recognizer'"'"'s built-in update facility is executed on the pair of the modified seeding and the seeded word hypotheses; declaring in the multilingual dispatcher component the winning hypothesis at the end of utterance as the utterance ending with the winning word hypothesis, and recovering the words of the utterance along with the language of each word from back trace information associated with seeding. - View Dependent Claims (14)
-
Specification