System and method for accented modification of a language model

US 20050165602A1
Filed: 12/07/2004
Published: 07/28/2005
Est. Priority Date: 12/31/2003
Status: Active Grant

First Claim

Patent Images

1. A method for modifying a language model, the method comprising the steps of:

identifying accented speech pronunciations of words of a language;

identifying pronunciation differences between customary speech pronunciations and the accented speech pronunciations;

identifying, for each of said pronunciation differences, a first list of words in the language model that instantiate said pronunciation differences;

selectively adding the first list of words and their accented speech pronunciations to an accented speech file; and

modifying the language model according to the accent speech file.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for a speech recognition technology that allows language models for a particular language to be customized through the addition of alternate pronunciations that are specific to the accent of the dictator, for a subset of the words in the language model. The system includes the steps of identifying the pronunciation differences that are best handled by modifying the pronunciations of the language model, identifying target words in the language model for pronunciation modification, and creating a accented speech file used to modify the language model.

Citations

29 Claims

1. A method for modifying a language model, the method comprising the steps of:
- identifying accented speech pronunciations of words of a language;
  
  identifying pronunciation differences between customary speech pronunciations and the accented speech pronunciations;
  
  identifying, for each of said pronunciation differences, a first list of words in the language model that instantiate said pronunciation differences;
  
  selectively adding the first list of words and their accented speech pronunciations to an accented speech file; and
  
  modifying the language model according to the accent speech file.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein identifying said pronunciation differences excludes subphonemic differences.
  - 3. The method of claim 1, further comprising:
    - selectively reducing the first list to words that are most frequently used in the language model.
  - 4. The method of claim 3, further comprising:
    - selectively reducing the first list to words that intrude on other words if they are not given accented speech pronunciations.
  - 5. The method of claim 4, further comprising:
    - selectively reducing the first list to short words.
  - 6. The method of claim 5, further comprising:
    - selectively reducing the first list to words with unrecognizable accented speech pronunciations.
  - 7. The method of claim 1, wherein the modifying the language model includes supplementing the language model pronunciations with accented speech pronunciations.
  - 8. The method of claim 1, wherein the modifying the language model includes replacing the language model pronunciations with accented speech pronunciations.
  - 9. The method of claim 1, further comprising:
    - identifying clone pronunciations between customary speech and accented speech; and
      
      selectively adding the clone pronunciations to the accented speech file.

10. A method for modifying a language model, the method comprising the steps of:
- identifying accented speech pronunciations of a language;
  
  identifying pronunciation differences between customary speech pronunciations and the accented speech pronunciations;
  
  identifying, for each of said pronunciation differences, words in the language model that instantiate said pronunciation differences;
  
  adding said words and said accented speech pronunciations corresponding to said words to an accented speech file according to a predetermined category; and
  
  modifying the language model according to the accent speech file.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 11. The method of claim 10, wherein identifying said pronunciation differences excludes subphonemic differences.
  - 12. The method of claim 10, wherein said predetermined category being all said words.
  - 13. The method of claim 10, wherein said predetermined category being the most frequently used words in the language model.
  - 14. The method of claim 10, wherein said predetermined category being the words intruding on other words if they are not given accented speech pronunciations.
  - 15. The method of claim 10, wherein said predetermined category being short words.
  - 16. The method of claim 10, wherein said predetermined category being the words with unrecognizable accented speech pronunciations.
  - 17. The method of claim 10, wherein the modifying the language model includes supplementing the language model pronunciations with accented speech pronunciations.
  - 18. The method of claim 10, wherein the modifying the language model includes replacing the language model pronunciations with accented speech pronunciations.
  - 19. The method of claim 10, further comprising:
    - identifying clone pronunciations between customary speech and accented speech; and
      
      selectively adding the clone pronunciations to the accented speech file.

20. A method for customizing a language model for accented speakers, the method comprising the steps of:
- identifying an accent;
  
  determining pronunciation differences between the identified accent and the language model;
  
  selecting a first subset of the pronunciation differences based on a first set of pre-determined criteria;
  
  listing a first set of instantiations based on said first subset;
  
  compiling an accent speech word list from the first set of instantiations;
  
  determining accent-specific pronunciations corresponding to words in the accent speech word list; and
  
  applying the accented speech word list and the accent-specific pronunciations to the language model.
- View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28)
- - 21. The method of claim 20, wherein determining the pronunciation differences includes system rule governed differences.
  - 22. The method of claim 20, wherein the first set of pre-determined criteria is phonemic differences.
  - 23. The method of claim 20, wherein the first set of pre-determined criteria is idiosyncratic criteria.
  - 24. The method of claim 20, wherein compiling an accent speech word list from the first set of instantiations is based on a second set of pre-determined criteria.
  - 25. The method of claim 24, wherein the second set of pre-determined criteria includes at least one of word frequency, pronunciation intrusions, and word length.
  - 26. The method of claim 25, wherein pronunciation intrusions are based on a third set of pre-determined criteria.
  - 27. The method of claim 24, wherein the second set of pre-determined criteria includes pronunciation intrusion wherein intruding and intruded-upon words may be distinguished by means other than pronunciation.
  - 28. The method of claim 20, further comprising:
    - identifying clone pronunciations between customary speech and accented speech; and
      
      selectively adding the clone pronunciations to the accent speech word file.

29. A method for modifying a language model, the method comprising the steps of:
- identifying accented speech pronunciations of words of a language;
  
  identifying pronunciation differences between customary speech pronunciations and the accented speech pronunciations;
  
  identifying, for each of said pronunciation differences, a first list of words in the language model that instantiate said pronunciation differences;
  
  selectively adding the first list of words and their accented speech pronunciations to an accented speech file;
  
  selectively reducing the first list to a second list of words that are most frequently used in the language model;
  
  selectively adding the second list of words and their accented speech pronunciations to the accented speech file;
  
  selectively reducing the second list to a third list of words, wherein said third list includes words that intrude on other words if they are not given accented speech pronunciations;
  
  selectively adding the third list of words and their accented speech pronunciations to the accented speech file;
  
  selectively reducing the third list to a forth list of short words;
  
  selectively adding the fourth list of words and their accented speech pronunciations to the accented speech file;
  
  selectively reducing the fourth list to a fifth list of words with unrecognizable accented speech pronunciations;
  
  selectively adding the fifth list of words and their accented speech pronunciations to the accented speech file; and
  
  modifying the language model according to the accented speech file.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Dictaphone Corporation (Microsoft Corporation)
Inventors
Uhrbach, Amy J., Cote, William F., Carrier, Jill, Han, Wensheng

Granted Patent

US 7,315,811 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/9
CPC Class Codes

G10L 15/183 using context dependencies,...

System and method for accented modification of a language model

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for accented modification of a language model

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links