Method and system for identifying and correcting accent-induced speech recognition difficulties

US 8,036,893 B2
Filed: 07/22/2004
Issued: 10/11/2011
Est. Priority Date: 07/22/2004
Status: Expired due to Fees

First Claim

Patent Images

1. A method of generating speech recognition output, the method comprising:

providing a plurality of different acoustic models specific to different languages, and a selected lexicon model;

generating a first speech recognition output for a speech input in a first language using a first model combination that combines one of the plurality of acoustic models with the selected lexicon model, wherein said one of said acoustic models and said selected lexicon model are specific to said first language;

in response to a confidence score for said first speech recognition output falling below a minimum acceptable score, generating one or more other speech recognition outputs for said speech input in said first language, wherein generating the one or more other speech recognition outputs comprises;

identifying a sequence of phonemes corresponding to the speech input using a different one of the plurality of acoustic models specific to a second language different from said first language, andmatching the sequence of phonemes to one or more speech segments and/or words using the selected lexicon model, wherein the selected lexicon model is specific to the first language and not to the second language; and

outputting a speech recognition output having a best confidence score among the generated speech recognition outputs.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system for use in speech recognition includes an acoustic module accessing a plurality of distinct-language acoustic models, each based upon a different language; a lexicon module accessing at least one lexicon model; and a speech recognition output module. The speech recognition output module generates a first speech recognition output using a first model combination that combines one of the plurality of distinct-language acoustic models with the at least one lexicon model. In response to a threshold determination, the speech recognition output module generates a second speech recognition output using a second model combination that combines a different one of the plurality of distinct-language acoustic models with the at least one distinct-language lexicon model.

Citations

17 Claims

1. A method of generating speech recognition output, the method comprising:
- providing a plurality of different acoustic models specific to different languages, and a selected lexicon model;
  
  generating a first speech recognition output for a speech input in a first language using a first model combination that combines one of the plurality of acoustic models with the selected lexicon model, wherein said one of said acoustic models and said selected lexicon model are specific to said first language;
  
  in response to a confidence score for said first speech recognition output falling below a minimum acceptable score, generating one or more other speech recognition outputs for said speech input in said first language, wherein generating the one or more other speech recognition outputs comprises;
  
  identifying a sequence of phonemes corresponding to the speech input using a different one of the plurality of acoustic models specific to a second language different from said first language, andmatching the sequence of phonemes to one or more speech segments and/or words using the selected lexicon model, wherein the selected lexicon model is specific to the first language and not to the second language; and
  
  outputting a speech recognition output having a best confidence score among the generated speech recognition outputs.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method as defined in claim 1, wherein generating the one or more other speech recognition outputs further comprises matching the sequence of phonemes to one or more speech segments and/or words using a lexicon model specific to a language different from the first language.
  - 3. The method of claim 1, further comprising generating a subsequent speech recognition output for a subsequent speech input from the speaker using a model combination associated with the speech recognition output having the best confidence score.
  - 4. The method of claim 3, further comprising:
    - calculating a confidence score for said subsequent speech recognition output; and
      
      in response to the confidence score for said subsequent speech recognition output being below a minimum acceptable score, repeating the step of generating one or more other speech recognition outputs for the subsequent speech input.
  - 5. The method of claim 1, wherein each of the first and second languages is a national language.

6. A system for use in speech recognition, the system comprising a combination of hardware and software that implements:
- an audio capture module for recording a speech input in a first language from a speaker;
  
  an acoustic module for accessing a plurality of different acoustic models specific to different languages;
  
  a lexicon module for accessing a selected lexicon model; and
  
  a speech recognition output module forgenerating a first speech recognition output for said speech input in the first language using a first model combination that combines one of the plurality of acoustic models withthe selected lexicon model, wherein said one of said acoustic models and said selected lexicon model are specific to said first language,in response to a confidence score for said first speech recognition output falling below a minimum acceptable score, generating one or more other speech recognition outputs for said speech input in said first language, wherein generating the one or more other speech recognition outputs comprises;
  
  identifying a sequence of phonemes corresponding to the speech input using a different one of the plurality of acoustic models specific to a second language different from said first language, andmatching the sequence of phonemes to one or more speech segments and/or words using the selected lexicon model, wherein the selected lexicon model is specific to the first language and not to the second language, andoutputting a speech recognition output having a best confidence score among the generated speech recognition outputs.
- View Dependent Claims (7, 8, 9, 10, 11, 12)
- - 7. The system as defined in claim 6, wherein the combination of hardware and software further implements a threshold determination module for comparing a first confidence score for the first speech recognition output with a minimum acceptable score.
  - 8. The system as defined in claim 6, wherein the combination of hardware and software further implements a confidence score generation module generating confidence scores for each speech recognition output.
  - 9. The system as defined in claim 8, wherein the combination of hardware and software further implements a best model combination determination module for determining the one of said other speech recognition outputs having a best confidence score, based upon the confidence scores generated for each of said other speech recognition outputs.
  - 10. The system as defined in claim 6, wherein generating the one or more other speech recognition outputs further comprises matching the sequence of phonemes to one or more speech segments and/or words using a lexicon model specific to a language different from the first language.
  - 11. The system as defined in claim 6, wherein the speech recognition output module further generates a speech recognition output for a subsequent speech input from the speaker using a model combination associated with the speech recognition output having the best confidence score.
  - 12. The system of claim 6, wherein each of the first and second languages is a national language.

13. A non-transitory computer readable storage medium storing computer instructions for:
- providing a plurality of different acoustic models specific to different languages, and a selected lexicon model;
  
  generating a first speech recognition output for a speech input in a first language using a first model combination that combines one of the plurality of acoustic models with the selected lexicon model, wherein said one of said acoustic models and said selected lexicon model are specific to said first language;
  
  in response to a confidence score for said first speech recognition output falling below a minimum acceptable score, generating one or more other speech recognition outputs for said speech input in said first language, wherein generating the one or more other speech recognition outputs comprises;
  
  identifying a sequence of phonemes corresponding to the speech input using a different one of the plurality of acoustic models specific to a second language different from said first language, andmatching the sequence of phonemes to one or more speech segments and/or words using the selected lexicon model, wherein the selected lexicon model is specific to the first language and not to the second language; and
  
  outputting a speech recognition output having a best confidence score among the generated speech recognition outputs.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The non-transitory computer readable storage medium as defined in claim 13, wherein generating the one or more other speech recognition outputs further comprises matching the sequence of phonemes to one or more speech segments and/or words using a lexicon model specific to a language different from the first language.
  - 15. The non-transitory computer readable storage medium of claim 13, storing further computer instructions for:
    - generating a subsequent speech recognition output for a subsequent speech input from the speaker using a model combination associated with the speech recognition output having the best confidence score.
  - 16. The non-transitory computer readable storage medium of claim 15, storing further computer instructions for:
    - calculating a confidence score for said subsequent speech recognition output; and
      
      in response to the confidence score for said subsequent speech recognition output being below a minimum acceptable score, repeating the step of generating one or more other speech recognition outputs for the subsequent speech input.
  - 17. The non-transitory computer readable storage medium of claim 13, wherein each of the first and second languages is a national language.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Reich, David E.
Primary Examiner(s)
Wozniak; James S.
Assistant Examiner(s)
Kovacek; David

Application Number

US10/896,405
Publication Number

US 20060020463A1
Time in Patent Office

2,637 Days
Field of Search

704 1- 10, 704231-232, 704235-257, 704E17001-E17016, 704E15001-E15032, 704E15037-E1505, 379 671- 68, 379 8801- 8809, 379 8811- 8818
US Class Current

704/257
CPC Class Codes

G10L 15/187   Phonemic context, e.g. pron...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 2015/227   of the speaker; Human-fact...

Method and system for identifying and correcting accent-induced speech recognition difficulties

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for identifying and correcting accent-induced speech recognition difficulties

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links