Method and system for learning linguistically valid word pronunciations from acoustic data

US 7,266,495 B1
Filed: 09/12/2003
Issued: 09/04/2007
Est. Priority Date: 09/12/2003
Status: Active Grant

First Claim

Patent Images

1. A computerized pronunciation system configured to generate pronunciations for words that are represented by waveforms and text, such that the pronunciations are spelled by phones in a phonetic alphabet for storage in a pronunciation dictionary, the system comprising:

a word list including at least one word;

transcribed acoustic data including at least one waveform for the word and transcribed text associated with the waveform;

a pronunciation-learning module configured to accept as input the word list and the transcribed acoustic data, the pronunciation-learning module including;

sets of initial pronunciations of the word,a scoring module configured score pronunciations and to generate phone probabilities, anda set of alternate pronunciations of the word, wherein the set of alternate pronunciations include a highest-scoring set of initial pronunciations with a highest-scoring substitute phone substituted for a lowest-probability phone; and

a pronunciation dictionary configured to receive the highest-scoring set of initial pronunciations and the set of alternate pronunciations.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computerized pronunciation system is provided for generating pronunciations for words and storing the pronunciations in a pronunciation dictionary. The system includes a word list including at least one word; transcribed acoustic data including at least one waveform for the word and transcribed text associated with the waveform; a pronunciation-learning module configured to accept as input the word list and the transcribed acoustic data, the pronunciation-learning module including: sets of initial pronunciations of the word, a scoring module configured score pronunciations and to generate phone probabilities, and a set of alternate pronunciations of the word, wherein the set of alternate pronunciations include a highest-scoring set of initial pronunciations with a highest-scoring substitute phone substituted for a lowest-probability phone; and a pronunciation dictionary configured to receive the highest-scoring set of initial pronunciations and the set of alternate pronunciations.

220 Citations

31 Claims

1. A computerized pronunciation system configured to generate pronunciations for words that are represented by waveforms and text, such that the pronunciations are spelled by phones in a phonetic alphabet for storage in a pronunciation dictionary, the system comprising:
- a word list including at least one word;
  
  transcribed acoustic data including at least one waveform for the word and transcribed text associated with the waveform;
  
  a pronunciation-learning module configured to accept as input the word list and the transcribed acoustic data, the pronunciation-learning module including;
  
  sets of initial pronunciations of the word,a scoring module configured score pronunciations and to generate phone probabilities, anda set of alternate pronunciations of the word, wherein the set of alternate pronunciations include a highest-scoring set of initial pronunciations with a highest-scoring substitute phone substituted for a lowest-probability phone; and
  
  a pronunciation dictionary configured to receive the highest-scoring set of initial pronunciations and the set of alternate pronunciations.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The system of claim 1, wherein the transcribed acoustic data includesa plurality of waveforms for the word, andtranscribed text for each waveform of the plurality of waveforms.
  - 3. The system of claim 2, wherein the plurality of waveforms are acoustic representations of the word spoken by a plurality of speakers.
  - 4. The system of claim 1, wherein the word list includes a plurality of words.
  - 5. The system of claim 4, wherein the transcribed acoustic data includesa plurality of waveforms for the plurality of words, andtranscribed text for each waveform of the plurality of waveforms.
  - 6. The system of claim 5, wherein the waveforms of the plurality of waveforms are acoustic representations of the plurality of words spoken by a plurality of speakers.
  - 7. The system of claim 1, wherein the pronunciation-learning module is further configured to:
    - force-align the sets of initial pronunciations to the waveform;
      
      thereaftergenerate the set of alternate pronunciations; and
      
      add the set of alternate pronunciations to the pronunciation dictionary.
  - 8. The system of claim 7, wherein the scoring module is configured to score the sets of initial pronunciations.
  - 9. The system of claim 8, wherein the scoring module is configured to generate a phone probability for each phone in a highest-scoring set of initial pronunciations and for each substitute phone in a set of substitute phones.
  - 10. The system of claim 1, wherein the phone probabilities are posterior probabilities.
  - 11. The system of claim 1, further comprising a letter-to-phone engine configured to generate initial pronunciations from which the sets of initial pronunciations are generated.
  - 12. The system of claim 1, wherein initial pronunciations from which the sets of initial pronunciation are generated are extracted from the pronunciation dictionary.
  - 13. The system of claim 1, where in the scoring module includes an automatic speech recognition (ASR) system configured to score the sets of initial pronunciations.
  - 14. The system of claim 13, wherein the pronunciation-learning module is further configured graph the sets of initial pronunciations, and the ASR system is configured to score graphed sets of initial pronunciations.
  - 15. The system of claim 13, wherein the ASR system is further configured to generate transcriptions of acoustic data spoken by a plurality of speakers, and wherein the transcriptions are included in the transcribed acoustic data.
  - 16. The system of claim 15, wherein the ASR system is further configured to collect feedback from the plurality of speakers to affirm correct recognition by the ASR system, and if recognition is correct, enter the transcribed words in the transcribed acoustic data.

17. A computerized pronunciation system configured to generate pronunciations for words that are represented by waveforms and text, such that the pronunciations are spelled by phones in a phonetic alphabet for storage in a pronunciation dictionary, the system comprising:
- a word list including at least one word;
  
  transcribed acoustic data including at least one waveform for the word and transcribed text associated with the waveform;
  
  a pronunciation-learning module configured to accept as input the word list and the transcribed acoustic data, the pronunciation-learning module including;
  
  sets of initial pronunciations of the word,an automatic speech recognition (ASR) system configured to score pronunciations,a scoring module configured to generate phone probabilities, anda set of alternate pronunciations of the word, wherein the set of alternate pronunciations include a highest-scoring set of initial pronunciations with a highest-scoring substitute phone substituted for a lowest-probability phone; and
  
  a pronunciation dictionary configured to receive the highest-scoring initial pronunciation and a highest-scoring set of alternate pronunciations.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25)
- - 18. The system of claim 17, wherein the word list includes a plurality of words.
  - 19. The system of claim 18, wherein the transcribed acoustic data includes a plurality of waveforms and transcribed text for the plurality of words.
  - 20. The system of claim 19, wherein the waveforms of the plurality of waveforms are acoustic representations of the plurality of words spoken by a plurality of speakers.
  - 21. The system of claim 17, further comprising a letter-to-phone engine configured to generate initial pronunciations from which the sets of initial pronunciations are generated.
  - 22. The system of claim 17, wherein initial pronunciations from which the sets of initial pronunciation are generated are extracted from the pronunciation dictionary.
  - 23. The system of claim 17, wherein the ASR system is configured to score graphed sets of initial pronunciations.
  - 24. The system of claim 17, wherein the ASR system is configured to generate transcriptions of acoustic data spoken by a plurality of speakers, wherein the transcriptions are included in the transcribed acoustic data.
  - 25. The system of claim 24, wherein the ASR system is further configured to collect feedback from the plurality of speakers that the transcriptions generated by the ASR system are words spoken by the plurality of speakers, and wherein if the collected feedback affirms correct recognition by the ASR system, the transcriptions are entered in the pronunciation dictionary.

26. A computerized pronunciation system configured to generate pronunciations for words that are represented by waveforms and text, such that the pronunciations are spelled by phones in a phonetic alphabet for storage in a pronunciation dictionary, the system comprising:
- a word list including a plurality of words;
  
  transcribed acoustic data including a set of waveforms for each of the words and a set of transcribed text corresponding to the waveforms;
  
  a pronunciation-learning module configured to accept as input the word list and the transcribed acoustic data, the pronunciation-learning module including;
  
  sets of initial pronunciations of the plurality of words,sets of alternate pronunciations of the plurality of words, wherein each set of alternate pronunciations includes a highest-scoring set of initial pronunciations with a unique substitute phone substituted for a lowest-probability phone of the highest-scoring set of initial pronunciations;
  
  a scoring module configured score the sets of initial and alternate pronunciations and to generate phone probabilities; and
  
  a pronunciation dictionary configured to receive the highest-scoring initial pronunciation and a highest-scoring set of alternate pronunciations.
- View Dependent Claims (27, 28, 29, 30, 31)
- - 27. The system of claim 26, wherein the sets of alternate pronunciations further include a set of alternate pronunciations that include the highest-scoring initial pronunciation with the lowest-probability phone removed.
  - 28. The system of claim 26, wherein the sets of alternate pronunciations further include additional sets of alternate pronunciations that include the highest-scoring initial pronunciation having a unique phone inserted adjacent to the lowest-probability phone.
  - 29. The system of claim 26, wherein the sets of alternate pronunciations further include additional sets of alternate pronunciations that include the highest-scoring initial pronunciation having a sequence of two phones substituted for the lowest-probability phone.
  - 30. The system of claim 26, wherein the sets of alternate pronunciations further include additional sets of alternate pronunciations that include the highest-scoring initial pronunciation having the lowest-probability phone and a right neighboring phone substituted with a unique phone.
  - 31. The system of claim 26, wherein the sets of alternate pronunciations further include additional sets of alternate pronunciations that include the highest-scoring initial pronunciation with the lowest-probability phone and a left neighboring phone substituted with a unique phone.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Beaufays, Francoise, Sankar, Ananth, Weintraub, Mitchel, Williams, Shaun
Primary Examiner(s)
Lerner; Martin

Application Number

US10/661,106
Time in Patent Office

1,453 Days
Field of Search

704/231, 704/236, 704/238, 704/239, 704/240, 704/243, 704/244, 704/251, 704/252, 704/254
US Class Current

704/236
CPC Class Codes

G10L 15/06 Creation of reference templ...

G10L 15/187 Phonemic context, e.g. pron...

Method and system for learning linguistically valid word pronunciations from acoustic data

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

220 Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for learning linguistically valid word pronunciations from acoustic data

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

220 Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links