Method and system for identifying and correcting accent-induced speech recognition difficulties

US 8,285,546 B2
Filed: 09/09/2011
Issued: 10/09/2012
Est. Priority Date: 07/22/2004
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

generating a first speech recognition result for a first speech input in a first language provided by a speaker; and

outputting the first speech recognition result;

wherein generating the first speech recognition result comprises;

identifying a sequence of phonemes corresponding to the first speech input using a native acoustic model specific to a second language different from the first language, wherein the second language is the speaker'"'"'s native language; and

matching the sequence of phonemes to one or more speech segments and/or words using a lexicon model specific to the first language and not to the second language.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system for use in speech recognition includes an acoustic module accessing a plurality of distinct-language acoustic models, each based upon a different language; a lexicon module accessing at least one lexicon model; and a speech recognition output module. The speech recognition output module generates a first speech recognition output using a first model combination that combines one of the plurality of distinct-language acoustic models with the at least one lexicon model. In response to a threshold determination, the speech recognition output module generates a second speech recognition output using a second model combination that combines a different one of the plurality of distinct-language acoustic models with the at least one distinct-language lexicon model.

399 Citations

23 Claims

1. A method comprising:
- generating a first speech recognition result for a first speech input in a first language provided by a speaker; and
  
  outputting the first speech recognition result;
  
  wherein generating the first speech recognition result comprises;
  
  identifying a sequence of phonemes corresponding to the first speech input using a native acoustic model specific to a second language different from the first language, wherein the second language is the speaker'"'"'s native language; and
  
  matching the sequence of phonemes to one or more speech segments and/or words using a lexicon model specific to the first language and not to the second language.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, further comprising:
    - generating a second speech recognition result for the first speech input using an acoustic model specific to the first language and the lexicon model specific to the first language;
      
      wherein the first speech recognition result is generated in response to the second speech recognition result having a confidence score below a minimum acceptable score.
  - 3. The method of claim 2, further comprising:
    - in response to the second speech recognition result having the confidence score below the minimum acceptable score, generating a third speech recognition result for the first speech input using the lexicon model specific to the first language and an acoustic model specific to a third language different from the first and second languages;
      
      wherein the first speech recognition result is generated in response to the third speech recognition result having a confidence score below the minimum acceptable score.
  - 4. The method of claim 2, further comprising:
    - in response to the second speech recognition result having the confidence score below the minimum acceptable score, generating a third speech recognition result for the first speech input using the lexicon model specific to the first language and an acoustic model specific to a third language different from the first and second languages;
      
      wherein the first speech recognition result is output in response to the first speech recognition result having a better confidence score than the third speech recognition result.
  - 5. The method of claim 1, further comprising:
    - generating a second speech recognition result for the first speech input using an acoustic model specific to the first language and the lexicon model specific to the first language;
      
      wherein the first speech recognition result is output in response to the first speech recognition result having a better confidence score than the second speech recognition result.
  - 6. The method of claim 1, further comprising:
    - generating a second speech recognition result for the first speech input using the lexicon model specific to the first language and an acoustic model specific to a third language different from the first and second languages;
      
      wherein the first speech recognition result is output in response to the first speech recognition result having a better confidence score than the second speech recognition result.
  - 7. The method of claim 1, further comprising designating the native acoustic model specific to the second language for use in recognizing subsequent speech input spoken by the speaker.
  - 8. The method of claim 1, further comprising generating a second speech recognition result for a second speech input in the second language, wherein generating the second speech recognition result comprises:
    - identifying a sequence of phonemes corresponding to the second speech input using the native acoustic model specific to the second language; and
      
      matching the sequence of phonemes to one or more speech segments and/or words using a lexicon model specific to the second language.

9. A system comprising a combination of hardware and software that implements:
- an audio capture module configured to capture a first speech input in a first language from a speaker; and
  
  a speech recognition module configured to generate a first speech recognition result for the first speech input, and output the first speech recognition result;
  
  wherein generating the first speech recognition result comprises;
  
  identifying a sequence of phonemes corresponding to the first speech input using an acoustic model specific to a second language different from the first language, wherein the acoustic model is trained on speech of a training speaker, wherein the second language is the training speaker'"'"'s native language; and
  
  matching the sequence of phonemes to one or more speech segments and/or words using a lexicon model specific to the first language and not to the second language.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The system of claim 9, wherein the speech recognition module is configured to:
    - generate a second speech recognition result for the first speech input using an acoustic model specific to the first language and the lexicon model specific to the first language; and
      
      generate the first speech recognition result in response to the second speech recognition result having a confidence score below a minimum acceptable score.
  - 11. The system of claim 10, wherein the speech recognition module is configured to:
    - in response to the second speech recognition result having the confidence score below the minimum acceptable score, generate a third speech recognition result for the first speech input using the lexicon model specific to the first language and an acoustic model specific to a third language different from the first and second languages; and
      
      generate the first speech recognition result in response to the third speech recognition result having a confidence score below the minimum acceptable score.
  - 12. The system of claim 10, wherein the speech recognition module is configured to:
    - in response to the second speech recognition result having the confidence score below the minimum acceptable score, generate a third speech recognition result for the first speech input using the lexicon model specific to the first language and an acoustic model specific to a third language different from the first and second languages; and
      
      output the first speech recognition result in response to the first speech recognition result having a better confidence score than the third speech recognition result.
  - 13. The system of claim 9, wherein the speech recognition module is configured to:
    - generate a second speech recognition result for the first speech input using an acoustic model specific to the first language and the lexicon model specific to the first language; and
      
      output the first speech recognition result in response to the first speech recognition result having a better confidence score than the second speech recognition result.
  - 14. The system of claim 9, wherein the speech recognition module is configured to:
    - generate a second speech recognition result for the first speech input using the lexicon model specific to the first language and an acoustic model specific to a third language different from the first and second languages; and
      
      output the first speech recognition result in response to the first speech recognition result having a better confidence score than the second speech recognition result.
  - 15. The system of claim 9, wherein the speech recognition module is further configured to generate a second speech recognition result for a second speech input in the second language, wherein generating the second speech recognition result comprises:
    - identifying a sequence of phonemes corresponding to the second speech input using the acoustic model specific to the second language; and
      
      matching the sequence of phonemes to one or more speech segments and/or words using a lexicon model specific to the second language.

16. An article of manufacture comprising a computer-readable storage medium storing computer instructions for:
- generating a first speech recognition result for a first speech input in a first language; and
  
  outputting the first speech recognition result;
  
  wherein generating the first speech recognition result comprises;
  
  identifying a sequence of phonemes corresponding to the first speech input using an acoustic model specific to a second language different from the first language, wherein the acoustic model is trained on speech of a training speaker, wherein the second language is the training speaker'"'"'s native language; and
  
  matching the sequence of phonemes to one or more speech segments and/or words using a lexicon model specific to the first language and not to the second language.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23)
- - 17. The article of manufacture of claim 16, wherein the computer-readable storage medium stores further computer instructions for:
    - generating a second speech recognition result for the first speech input using an acoustic model specific to the first language and the lexicon model specific to the first language;
      
      wherein the first speech recognition result is generated in response to the second speech recognition result having a confidence score below a minimum acceptable score.
  - 18. The article of manufacture of claim 17, wherein the computer-readable storage medium stores further computer instructions for:
    - in response to the second speech recognition result having the confidence score below the minimum acceptable score, generating a third speech recognition result for the first speech input using the lexicon model specific to the first language and an acoustic model specific to a third language different from the first and second languages;
      
      wherein the first speech recognition result is generated in response to the third speech recognition result having a confidence score below the minimum acceptable score.
  - 19. The article of manufacture of claim 17, wherein the computer-readable storage medium stores further computer instructions for:
    - in response to the second speech recognition result having the confidence score below the minimum acceptable score, generating a third speech recognition result for the first speech input using the lexicon model specific to the first language and an acoustic model specific to a third language different from the first and second languages;
      
      wherein the first speech recognition result is output in response to the first speech recognition result having a better confidence score than the third speech recognition result.
  - 20. The article of manufacture of claim 16, wherein the computer-readable storage medium stores further computer instructions for:
    - generating a second speech recognition result for the first speech input using an acoustic model specific to the first language and the lexicon model specific to the first language;
      
      wherein the first speech recognition result is output in response to the first speech recognition result having a better confidence score than the second speech recognition result.
  - 21. The article of manufacture of claim 16, wherein the computer-readable storage medium stores further computer instructions for:
    - generating a second speech recognition result for the first speech input using the lexicon model specific to the first language and an acoustic model specific to a third language different from the first and second languages;
      
      wherein the first speech recognition result is output in response to the first speech recognition result having a better confidence score than the second speech recognition result.
  - 22. The article of manufacture of claim 16, wherein the computer-readable storage medium stores further computer instructions for designating the acoustic model specific to the second language for use in recognizing subsequent speech input spoken by a speaker who spoke the first speech input.
  - 23. The article of manufacture of claim 16, wherein the computer-readable storage medium stores further computer instructions for generating a second speech recognition result for a second speech input in the second language, wherein generating the second speech recognition result comprises:
    - identifying a sequence of phonemes corresponding to the second speech input using the acoustic model specific to the second language; and
      
      matching the sequence of phonemes to one or more speech segments and/or words using a lexicon model specific to the second language.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Reich, David E.
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
KOVACEK, DAVID M

Application Number

US13/228,879
Publication Number

US 20110320203A1
Time in Patent Office

396 Days
Field of Search

704 1- 10, 704231-232, 704235-257, 704E17001-E17016, 704E15001-E1505, 379 671- 68, 379 8801- 8809, 379 8811- 8818
US Class Current

704/257
CPC Class Codes

G10L 15/187   Phonemic context, e.g. pron...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 2015/227   of the speaker; Human-fact...

Method and system for identifying and correcting accent-induced speech recognition difficulties

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

399 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for identifying and correcting accent-induced speech recognition difficulties

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

399 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links