SPEECH RECOGNITION BASED ON PRONUNCIATION MODELING

US 20120271635A1
Filed: 07/02/2012
Published: 10/25/2012
Est. Priority Date: 04/27/2006
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

approximating transcribed speech using a phonemic transcription dataset associated with a speaker, to yield a language model, where the phonemic transcription dataset is based on a pronunciation model of the speaker;

incorporating, into the language model, pronunciation probabilities associated with respective unique labels for each different pronunciation of a word, wherein the respective unique label for a most frequent word indicates a special status in the language model; and

after incorporating the pronunciation probabilities into the language model, recognizing an utterance using the language model.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for performing speech recognition is disclosed. The method comprises receiving an utterance, applying the utterance to a recognizer with a language model having pronunciation probabilities associated with unique word identifiers for words given their pronunciations and presenting a recognition result for the utterance. Recognition improvement is found by moving a pronunciation model from a dictionary to the language model.

318 Citations

20 Claims

1. A method comprising:
- approximating transcribed speech using a phonemic transcription dataset associated with a speaker, to yield a language model, where the phonemic transcription dataset is based on a pronunciation model of the speaker;
  
  incorporating, into the language model, pronunciation probabilities associated with respective unique labels for each different pronunciation of a word, wherein the respective unique label for a most frequent word indicates a special status in the language model; and
  
  after incorporating the pronunciation probabilities into the language model, recognizing an utterance using the language model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, further comprising:
    - removing the pronunciation probabilities from a pronunciation dictionary.
  - 3. The method of claim 1, wherein the language model is generated by modeling pronunciation dependencies across word boundaries.
  - 4. The method of claim 1, wherein at least one of contextual dependencies and consistency in pronunciation style exist throughout the utterance.
  - 5. The method of claim 1, wherein the pronunciation probabilities in the language model further comprise pronunciation dependent word pairs as lexical items that change a behavior of the language model to approximate higher order n-gram language models.
  - 6. The method of claim 1, further comprising:
    - creating a wide context pronunciation model based on having the pronunciation probabilities in the language model; and
      
      determining a probability of observing a particular word in the utterance using the wide context pronunciation model.
  - 7. The method of claim 1, wherein the pronunciation probabilities comprise a set of most frequent words each with more than one pronunciation alternative.
  - 8. The method of claim 7, wherein the more than one pronunciation alternative is in the language model.

9. A system comprising:
- a processor; and
  
  a computer-readable storage medium storing instructions which, when executed on the processor, cause the processor to perform a method comprising;
  
  approximating transcribed speech using a phonemic transcription dataset associated with a speaker, to yield a language model, where the phonemic transcription dataset is based on a pronunciation model of the speaker;
  
  incorporating, into the language model, pronunciation probabilities associated with respective unique labels for each different pronunciation of a word, wherein the respective unique label for a most frequent word indicates a special status in the language model; and
  
  after incorporating the pronunciation probabilities into the language model, recognizing an utterance using the language model.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9, wherein the computer-readable storage medium stores additional instructions which, when executed on the processor, cause the processor to perform a step comprising:
    - removing the pronunciation probabilities from a pronunciation dictionary.
  - 11. The system of claim 9, wherein the language model is generated by modeling pronunciation dependencies across word boundaries.
  - 12. The system of claim 9, wherein at least one of contextual dependencies and consistency in pronunciation style exist throughout the utterance.
  - 13. The system of claim 9, wherein the pronunciation probabilities in the language model further comprise pronunciation dependent word pairs as lexical items that change a behavior of the language model to approximate higher order n-gram language models.
  - 14. The system of claim 9, the computer-readable storage medium storing additional instructions which, when executed on the processor, cause the processor to perform steps comprising:
    - creating a wide context pronunciation model based on having the pronunciation probabilities in the language model; and
      
      determining a probability of observing a particular word in the utterance using the wide context pronunciation model.
  - 15. The system of claim 9, wherein the pronunciation probabilities comprise a set of most frequent words each with more than one pronunciation alternative.
  - 16. The system of claim 15, wherein the more than one pronunciation alternative is in the language model.

17. A computer-readable storage medium storing instructions which, when executed on a processor, cause the processor to perform a method comprising:
- approximating transcribed speech using a phonemic transcription dataset associated with a speaker, to yield a language model, where the phonemic transcription dataset is based on a pronunciation model of the speaker;
  
  incorporating, into the language model, pronunciation probabilities associated with respective unique labels for each different pronunciation of a word, wherein the respective unique label for a most frequent word indicates a special status in the language model; and
  
  after incorporating the pronunciation probabilities into the language model, recognizing an utterance using the language model.
- View Dependent Claims (18, 19, 20)
- - 18. The computer-readable storage medium of claim 17, storing additional instructions which, when executed on the processor, cause the processor to perform a step comprising:
    - removing the pronunciation probabilities from a pronunciation dictionary.
  - 19. The computer-readable storage medium of claim 17, wherein the language model is generated by modeling pronunciation dependencies across word boundaries.
  - 20. The computer-readable storage medium of claim 17, wherein at least one of contextual dependencies and consistency in pronunciation style exist throughout the utterance.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
AT&T Intellectual Property II LP (AT&T, Inc.)
Inventors
Ljolje, Andrej

Granted Patent

US 8,532,993 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/254
CPC Class Codes

G10L 15/063 Training

G10L 15/187 Phonemic context, e.g. pron...

SPEECH RECOGNITION BASED ON PRONUNCIATION MODELING

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

318 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

SPEECH RECOGNITION BASED ON PRONUNCIATION MODELING

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

318 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others