GRAPHEME-TO-PHONEME CONVERSION USING ACOUSTIC DATA

US 20090150153A1
Filed: 12/07/2007
Published: 06/11/2009
Est. Priority Date: 12/07/2007
Status: Active Grant

First Claim

Patent Images

1. In a computing environment, a method comprising:

modeling acoustic data, a phoneme sequence, a grapheme sequence and an alignment between the phoneme sequence and the grapheme sequence to provide a graphoneme model; and

retraining a grapheme to phoneme model usable in speech recognition by optimizing the graphoneme model using acoustic data.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Described is the use of acoustic data to improve grapheme-to-phoneme conversion for speech recognition, such as to more accurately recognize spoken names in a voice-dialing system. A joint model of acoustics and graphonemes (acoustic data, phonemes sequences, grapheme sequences and an alignment between phoneme sequences and grapheme sequences) is described, as is retraining by maximum likelihood training and discriminative training in adapting graphoneme model parameters using acoustic data. Also described is the unsupervised collection of grapheme labels for received acoustic data, thereby automatically obtaining a substantial number of actual samples that may be used in retraining. Speech input that does not meet a confidence threshold may be filtered out so as to not be used by the retrained model.

53 Citations

View as Search Results

20 Claims

1. In a computing environment, a method comprising:
- modeling acoustic data, a phoneme sequence, a grapheme sequence and an alignment between the phoneme sequence and the grapheme sequence to provide a graphoneme model; and
  
  retraining a grapheme to phoneme model usable in speech recognition by optimizing the graphoneme model using acoustic data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1 wherein optimizing the graphoneme model comprises performing maximum likelihood training of graphoneme model parameters using the acoustic data.
  - 3. The method of claim 2 wherein the maximum likelihood training includes using a current graphoneme model to generate a set of best phoneme sequence hypotheses for a given grapheme sequence, and re-ranking the set of best hypotheses based on acoustic data and the current graphoneme model.
  - 4. The method of claim 1 wherein optimizing the graphoneme model comprises performing discriminative training of graphoneme model parameters using the acoustic data.
  - 5. The method of claim 1 wherein retraining the grapheme to phoneme model comprises combining a pronunciation lexicon and acoustic information.
  - 6. The method of claim 5 wherein combining the pronunciation lexicon and the acoustic information comprises interpolating graphoneme model parameters trained using a pronunciation lexicon with those trained via maximum likelihood training or discriminative training using acoustic data.
  - 7. The method of claim 5 wherein combining the pronunciation lexicon and the acoustic information comprises obtaining a phoneme sequence corresponding to an acoustic waveform sample and a grapheme sequence corresponding to the same acoustic waveform sample to obtain a grapheme-phoneme pair, and combining a substantial number of such grapheme-phoneme pairs with data in the pronunciation lexicon.
  - 8. The method of claim 1 further comprising, collecting a grapheme label for an acoustic waveform received as speech input from a speaker, including by recording the acoustic data, recognizing the acoustic data as a potential grapheme label, obtaining confirmation from the speaker that the potential grapheme label correctly applies to the speech input, and persisting the acoustic data in association with an actual grapheme label that corresponds to the potential grapheme label upon the confirmation.
  - 9. The method of claim 8 further comprising performing a plurality of interactions with the speaker to receive the acoustic data and obtain the confirmation.
  - 10. The method of claim 8 further comprising filtering out the acoustic data and the associated grapheme label for speech input that does not meet a confidence threshold.

11. In a computing environment, a system comprising:
- a grapheme to phoneme model;
  
  a recognizer coupled to the grapheme to phoneme model to recognize input speech as a corresponding grapheme sequence; and
  
  a retraining mechanism coupled to the recognizer that retrains the grapheme to phoneme model into a retrained grapheme to phoneme model based upon acoustic data and associated graphemes collected by a recognition system.
- View Dependent Claims (12, 13, 14, 15, 16)
- - 12. The system of claim 11 wherein the retraining mechanism performs maximum likelihood training or discriminative training of grapheme to phoneme model parameters based on the acoustic data.
  - 13. The system of claim 11 wherein the retraining mechanism combines a pronunciation lexicon and acoustic information by interpolating grapheme to phoneme model parameters trained using a pronunciation lexicon with those trained using acoustic data, or by obtaining a phoneme sequence corresponding to an acoustic waveform sample and a grapheme sequence corresponding to the same acoustic waveform sample to obtain a grapheme-phoneme pair, and combining a substantial number of such grapheme-phoneme pairs with data in the pronunciation lexicon.
  - 14. The system of claim 11 wherein the recognition system collects the acoustic data and associated graphemes by recording acoustic data input as speech by a speaker, recognizing the acoustic data into data corresponding to a grapheme label, and associating the acoustic data with the grapheme label upon obtaining confirmation from the speaker that the grapheme label correctly applies to the speech input.
  - 15. The system of claim 14 wherein the recognition system comprises a mechanism that receives speech in the form of a name, records the acoustic data corresponding to the name, recognizes the name as the grapheme label, and persists the acoustic data in conjunction with the grapheme label.
  - 16. The system of claim 14 further comprising means for filtering out the acoustic data and the associated grapheme label for speech input that does not meet a confidence threshold.

17. A computer-readable medium having computer-executable instructions, which when executed perform steps, comprising:
- receiving acoustic data from a speaker;
  
  recognizing the acoustic data as a result and associated potential grapheme sequence;
  
  confirming with the speaker whether the result correctly applies to the acoustic data, and if so, associating the acoustic data with an actual grapheme sequence corresponding to the potential grapheme sequence, and if not, further interacting with the speaker until a result is confirmed as correctly applying to the acoustic data and associating the corresponding grapheme sequence as the actual grapheme sequence; and
  
  using the acoustic data and associated actual grapheme sequence for subsequent speech recognition.
- View Dependent Claims (18, 19, 20)
- - 18. The computer-readable medium of claim 17 wherein using the acoustic data and associated grapheme sequence for subsequent speech recognition comprises retraining a model that maps between grapheme sequences and phoneme sequences based on the acoustic data and associated grapheme.
  - 19. The computer-readable medium of claim 18 wherein the retraining is done using maximum likelihood or discriminative training.
  - 20. The computer-readable medium of claim 17 wherein further interacting with the speaker comprises receiving further acoustic data from a speaker, using the further acoustic data to determine another grapheme, confirming with the speaker that the other grapheme correctly applies to the further acoustic data, and associating the other grapheme with the acoustic data and the further acoustic data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Acero, Alejandro, Li, Xiao, Gunawardana, Asela J. R.

Granted Patent

US 7,991,615 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/254
CPC Class Codes

G10L 13/08   Text analysis or generation...

G10L 15/063   Training

G10L 15/187   Phonemic context, e.g. pron...

GRAPHEME-TO-PHONEME CONVERSION USING ACOUSTIC DATA

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

53 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

GRAPHEME-TO-PHONEME CONVERSION USING ACOUSTIC DATA

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

53 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links