Method and system for identifying and correcting accent-induced speech recognition difficulties
First Claim
1. A method of generating speech recognition output, the method comprising:
- providing a plurality of different acoustic models specific to different languages, and a selected lexicon model;
generating a first speech recognition output for a speech input in a first language using a first model combination that combines one of the plurality of acoustic models with the selected lexicon model, wherein said one of said acoustic models and said selected lexicon model are specific to said first language;
in response to a confidence score for said first speech recognition output falling below a minimum acceptable score, generating one or more other speech recognition outputs for said speech input in said first language, wherein generating the one or more other speech recognition outputs comprises;
identifying a sequence of phonemes corresponding to the speech input using a different one of the plurality of acoustic models specific to a second language different from said first language, andmatching the sequence of phonemes to one or more speech segments and/or words using the selected lexicon model, wherein the selected lexicon model is specific to the first language and not to the second language; and
outputting a speech recognition output having a best confidence score among the generated speech recognition outputs.
2 Assignments
0 Petitions
Accused Products
Abstract
A system for use in speech recognition includes an acoustic module accessing a plurality of distinct-language acoustic models, each based upon a different language; a lexicon module accessing at least one lexicon model; and a speech recognition output module. The speech recognition output module generates a first speech recognition output using a first model combination that combines one of the plurality of distinct-language acoustic models with the at least one lexicon model. In response to a threshold determination, the speech recognition output module generates a second speech recognition output using a second model combination that combines a different one of the plurality of distinct-language acoustic models with the at least one distinct-language lexicon model.
-
Citations
17 Claims
-
1. A method of generating speech recognition output, the method comprising:
-
providing a plurality of different acoustic models specific to different languages, and a selected lexicon model; generating a first speech recognition output for a speech input in a first language using a first model combination that combines one of the plurality of acoustic models with the selected lexicon model, wherein said one of said acoustic models and said selected lexicon model are specific to said first language; in response to a confidence score for said first speech recognition output falling below a minimum acceptable score, generating one or more other speech recognition outputs for said speech input in said first language, wherein generating the one or more other speech recognition outputs comprises; identifying a sequence of phonemes corresponding to the speech input using a different one of the plurality of acoustic models specific to a second language different from said first language, and matching the sequence of phonemes to one or more speech segments and/or words using the selected lexicon model, wherein the selected lexicon model is specific to the first language and not to the second language; and outputting a speech recognition output having a best confidence score among the generated speech recognition outputs. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system for use in speech recognition, the system comprising a combination of hardware and software that implements:
-
an audio capture module for recording a speech input in a first language from a speaker; an acoustic module for accessing a plurality of different acoustic models specific to different languages; a lexicon module for accessing a selected lexicon model; and a speech recognition output module for generating a first speech recognition output for said speech input in the first language using a first model combination that combines one of the plurality of acoustic models with the selected lexicon model, wherein said one of said acoustic models and said selected lexicon model are specific to said first language, in response to a confidence score for said first speech recognition output falling below a minimum acceptable score, generating one or more other speech recognition outputs for said speech input in said first language, wherein generating the one or more other speech recognition outputs comprises; identifying a sequence of phonemes corresponding to the speech input using a different one of the plurality of acoustic models specific to a second language different from said first language, and matching the sequence of phonemes to one or more speech segments and/or words using the selected lexicon model, wherein the selected lexicon model is specific to the first language and not to the second language, and outputting a speech recognition output having a best confidence score among the generated speech recognition outputs. - View Dependent Claims (7, 8, 9, 10, 11, 12)
-
-
13. A non-transitory computer readable storage medium storing computer instructions for:
-
providing a plurality of different acoustic models specific to different languages, and a selected lexicon model; generating a first speech recognition output for a speech input in a first language using a first model combination that combines one of the plurality of acoustic models with the selected lexicon model, wherein said one of said acoustic models and said selected lexicon model are specific to said first language; in response to a confidence score for said first speech recognition output falling below a minimum acceptable score, generating one or more other speech recognition outputs for said speech input in said first language, wherein generating the one or more other speech recognition outputs comprises; identifying a sequence of phonemes corresponding to the speech input using a different one of the plurality of acoustic models specific to a second language different from said first language, and matching the sequence of phonemes to one or more speech segments and/or words using the selected lexicon model, wherein the selected lexicon model is specific to the first language and not to the second language; and outputting a speech recognition output having a best confidence score among the generated speech recognition outputs. - View Dependent Claims (14, 15, 16, 17)
-
Specification