Method for adding phonetic descriptions to a speech recognition lexicon

US 6,973,427 B2
Filed: 12/26/2000
Issued: 12/06/2005
Est. Priority Date: 12/26/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A method for adding an acoustic description of a word to a speech recognition lexicon, the method comprising:

converting the text of the word into at least one orthographically derived acoustic description of the word;

generating a score for an orthographically derived acoustic description based in part on a comparison between the orthographically derived acoustic description and a speech signal representing a user'"'"'s pronunciation of the word;

identifying a speech-based acoustic description of the word and a score for the speech-based acoustic description from the speech signal representing the user'"'"'s pronunciation of the word, wherein the speech-based acoustic description is not associated with the text of the word; and

selecting one of the orthographically derived acoustic description and the speech-based acoustic description as the acoustic description of the word based on the score for the orthographically derived acoustic description and the score for the speech-based acoustic description.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and computer-readable medium convert the text of a word and a user'"'"'s pronunciation of the word into a phonetic description to be added to a speech recognition lexicon. Initially, two possible phonetic descriptions are generated. One phonetic description is formed from the text of the word. The other phonetic description is formed by decoding a speech signal representing the user'"'"'s pronunciation of the word. Both phonetic descriptions are scored based on their correspondence to the user'"'"'s pronunciation. The phonetic description with the highest score is then selected for entry in the speech recognition lexicon.

67 Citations

View as Search Results

18 Claims

1. A method for adding an acoustic description of a word to a speech recognition lexicon, the method comprising:
- converting the text of the word into at least one orthographically derived acoustic description of the word;
  
  generating a score for an orthographically derived acoustic description based in part on a comparison between the orthographically derived acoustic description and a speech signal representing a user'"'"'s pronunciation of the word;
  
  identifying a speech-based acoustic description of the word and a score for the speech-based acoustic description from the speech signal representing the user'"'"'s pronunciation of the word, wherein the speech-based acoustic description is not associated with the text of the word; and
  
  selecting one of the orthographically derived acoustic description and the speech-based acoustic description as the acoustic description of the word based on the score for the orthographically derived acoustic description and the score for the speech-based acoustic description.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1 wherein generating a score for an orthographically derived acoustic description comprises generating an acoustic model score.
  - 3. The method of claim 2 wherein identifying a score for the speech-based acoustic description comprises generating an acoustic model score for at least one speech-based acoustic description and using the score as at least part of the score for the speech-based acoustic description.
  - 4. The method of claim 3 wherein generating an acoustic model score for the orthographically derived acoustic description and generating an acoustic model score for at least one speech-based acoustic description comprises using the same acoustic model to generate both acoustic model scores.
  - 5. The method of claim 3 wherein identifying a score for the speech-based acoustic description further comprises generating a language model score for the at least one speech-based acoustic description and using the language model score as part of the score for the at least one speech-based acoustic description.
  - 6. The method of claim 5 wherein generating an acoustic model score and generating a language model score for at least one speech-based acoustic description comprises generating an acoustic model score and a language model score for a sequence of syllable-like units and wherein the speech-based acoustic description is derived from the sequence of syllable-like units.
  - 7. The method of claim 6 wherein deriving the speech-based acoustic description from the sequence of syllable-like units comprises dividing the sequence of syllable-like units into a sequence of phonemes.
  - 8. The method of claim 6 wherein generating a language model score comprises generating a language model score based on a trigram language model for syllable-like units.
  - 9. The method of claim 6 wherein generating an acoustic model score for a sequence of syllable-like units comprises generating acoustic model scores for each of a sequence of phonemes that form the sequence of syllable-like units.
  - 10. The method of claim 1 further comprising displaying a user interface comprising an edit box in which a user may enter the text of the word and a list box that displays words for which an acoustic description has been previously added to the speech recognition lexicon.
  - 11. The method of claim 10 further comprising:
    - receiving an indication that a user has selected a word in the list box;
      
      retrieving the added acoustic description of the word from the speech recognition lexicon; and
      
      converting the retrieved acoustic description into an audible signal.

12. A computer-readable medium having computer-executable instructions for performing steps comprising:
- receiving text of a word for which a phonetic description is to be added to a speech recognition lexicon;
  
  receiving a representation of a speech signal produced by a person pronouncing the word;
  
  converting the text of the word into a text-based phonetic description of the word;
  
  generating a speech-based phonetic description of the word from the representation of the speech signal without using the text of the word; and
  
  selecting a phonetic description of the word to add to the speech recognition lexicon by selecting between the text-based phonetic description and the speech-based phonetic description based in part on the correspondence between each phonetic description and the representation of the speech signal.
- View Dependent Claims (13, 14, 15, 16, 17, 18)
- - 13. The computer-readable medium of claim 12 wherein generating a speech-based phonetic description comprises:
    - generating a plurality of possible phonetic descriptions;
      
      using at least one model to score each possible phonetic description; and
      
      selecting the possible phonetic description with the highest score as the speech-based phonetic description.
  - 14. The computer-readable medium of claim 13 wherein using at least one model comprises using an acoustic model and a language model.
  - 15. The computer-readable medium of claim 14 wherein using a language model comprises using a language model that is based on syllable-like units.
  - 16. The computer-readable medium of claim 15 wherein each syllable-like unit comprises a sequence of phonemes and wherein using an acoustic model to score a possible phonetic description comprises generating acoustic model scores for each of the phonemes in a syllable-like unit and summing the acoustic model scores of the phonemes to generate an acoustic model score for the syllable-like unit.
  - 17. The computer-readable medium of claim 12 wherein:
    - converting the text of the word into a text-based phonetic description further comprises generating a score for the text-based phonetic description based on the correspondence between the text-based phonetic description and the representation of the speech signal;
      
      generating a speech-based phonetic description further comprises generating a score for the speech-based phonetic description based on the correspondence between the speech-based phonetic description and the representation of the speech signal; and
      
      selecting between the text-based phonetic description and the speech-based phonetic description comprises selecting the phonetic description with the highest score.
  - 18. The computer-readable medium of claim 12 wherein the steps further comprise:
    - receiving an instruction to generate an audible pronunciation of a phonetic description previously added to the speech recognition lexicon;
      
      retrieving the added phonetic description from the speech recognition lexicon; and
      
      causing an audible pronunciation to be generated based on the retrieved phonetic description.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Hwang, Mei-Yuh, Weiss, Rebecca C., Alleva, Fileno A.
Primary Examiner(s)
ARMSTRONG, ANGELA A

Application Number

US09/748,453
Publication Number

US 20020082831A1
Time in Patent Office

1,806 Days
Field of Search

704/249, 704/9, 704/1, 704/10, 704/251, 704/270, 704/254, 704/220, 704/260, 704/244, 704/240, 704/256, 704/255, 704/243, 704/245
US Class Current

704/249
CPC Class Codes

G10L 15/063 Training

G10L 2015/0636 Threshold criteria for the ...

Method for adding phonetic descriptions to a speech recognition lexicon

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

67 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Method for adding phonetic descriptions to a speech recognition lexicon

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

67 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links