Grapheme-to-phoneme conversion using acoustic data

US 7,991,615 B2
Filed: 12/07/2007
Issued: 08/02/2011
Est. Priority Date: 12/07/2007
Status: Active Grant

First Claim

Patent Images

1. A method performed by a computer comprising storage and a processor, the method comprising:

modeling acoustic data, a phoneme sequence, a grapheme sequence and an alignment between the phoneme sequence and the grapheme sequence to provide a graphoneme model stored in the storage; and

retraining, by the processor, a grapheme to phoneme model usable in speech recognition by optimizing the graphoneme model using acoustic data stored in the storage.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Described is the use of acoustic data to improve grapheme-to-phoneme conversion for speech recognition, such as to more accurately recognize spoken names in a voice-dialing system. A joint model of acoustics and graphonemes (acoustic data, phonemes sequences, grapheme sequences and an alignment between phoneme sequences and grapheme sequences) is described, as is retraining by maximum likelihood training and discriminative training in adapting graphoneme model parameters using acoustic data. Also described is the unsupervised collection of grapheme labels for received acoustic data, thereby automatically obtaining a substantial number of actual samples that may be used in retraining. Speech input that does not meet a confidence threshold may be filtered out so as to not be used by the retrained model.

Citations

17 Claims

1. A method performed by a computer comprising storage and a processor, the method comprising:
- modeling acoustic data, a phoneme sequence, a grapheme sequence and an alignment between the phoneme sequence and the grapheme sequence to provide a graphoneme model stored in the storage; and
  
  retraining, by the processor, a grapheme to phoneme model usable in speech recognition by optimizing the graphoneme model using acoustic data stored in the storage.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1 wherein optimizing the graphoneme model comprises performing maximum likelihood training of graphoneme model parameters using the acoustic data.
  - 3. The method of claim 2 wherein the maximum likelihood training includes using a current graphoneme model to generate a set of best phoneme sequence hypotheses for a given grapheme sequence, and re-ranking the set of best hypotheses based on acoustic data and the current graphoneme model.
  - 4. The method of claim 1 wherein optimizing the graphoneme model comprises performing discriminative training of graphoneme model parameters using the acoustic data.
  - 5. The method of claim 1 wherein retraining the grapheme to phoneme model comprises combining a pronunciation lexicon and acoustic information.
  - 6. The method of claim 5 wherein combining the pronunciation lexicon and the acoustic information comprises interpolating graphoneme model parameters trained using a pronunciation lexicon with those trained via maximum likelihood training or discriminative training using acoustic data.
  - 7. The method of claim 5 wherein combining the pronunciation lexicon and the acoustic information comprises obtaining a phoneme sequence corresponding to an acoustic waveform sample and a grapheme sequence corresponding to the same acoustic waveform sample to obtain a grapheme-phoneme pair, and combining a substantial number of such grapheme-phoneme pairs with data in the pronunciation lexicon.
  - 8. The method of claim 1 further comprising, collecting a grapheme label for an acoustic waveform received as speech input from a speaker, including by recording the acoustic data, recognizing the acoustic data as a potential grapheme label, obtaining confirmation from the speaker that the potential grapheme label correctly applies to the speech input, and persisting the acoustic data in association with an actual grapheme label that corresponds to the potential grapheme label upon the confirmation.
  - 9. The method of claim 8 further comprising performing a plurality of interactions with the speaker to receive the acoustic data and obtain the confirmation.
  - 10. The method of claim 8 further comprising filtering out the acoustic data and the associated grapheme label for speech input that does not meet a confidence threshold.

11. A computer system comprising:
- a grapheme to phoneme model stored in storage of the computer;
  
  a recognizer coupled to the grapheme to phoneme model to recognize input speech as a corresponding grapheme sequence, the recognizer executed by a processor of the computer; and
  
  a retraining mechanism coupled to the recognizer that retrains the grapheme to phoneme model into a retrained grapheme to phoneme model based upon acoustic data and associated graphemes collected by a recognition system, the retraining mechanism executed by the processor of the computer.
- View Dependent Claims (12, 13, 14, 15, 16)
- - 12. The computer of claim 11 wherein the retraining mechanism performs maximum likelihood training or discriminative training of grapheme to phoneme model parameters based on the acoustic data.
  - 13. The computer of claim 11 wherein the retraining mechanism combines a pronunciation lexicon and acoustic information by interpolating grapheme to phoneme model parameters trained using a pronunciation lexicon with those trained using acoustic data, or by obtaining a phoneme sequence corresponding to an acoustic waveform sample and a grapheme sequence corresponding to the same acoustic waveform sample to obtain a grapheme-phoneme pair, and combining a substantial number of such grapheme-phoneme pairs with data in the pronunciation lexicon.
  - 14. The computer of claim 11 wherein the recognition system collects the acoustic data and associated graphemes by recording acoustic data input as speech by a speaker, recognizing the acoustic data into data corresponding to a grapheme label, and associating the acoustic data with the grapheme label upon obtaining confirmation from the speaker that the grapheme label correctly applies to the speech input.
  - 15. The computer of claim 14 wherein the recognition system comprises a mechanism that receives speech in the form of a name, records the acoustic data corresponding to the name, recognizes the name as the grapheme label, and persists the acoustic data in conjunction with the grapheme label.
  - 16. The computer of claim 14 further comprising means for filtering out the acoustic data and the associated grapheme label for speech input that does not meet a confidence threshold.

17. One or more tangible computer-readable storage media storing information to enable a computer to perform a process, the process comprising:
- modeling, by the computer, acoustic data, a phoneme sequence, a grapheme sequence and an alignment between the phoneme sequence and the grapheme sequence to provide a graphoneme model stored by the computer;
  
  retraining, by the computer, a grapheme to phoneme model by optimizing the graphoneme model using acoustic data stored in the storage; and
  
  using the grapheme to phoneme model to perform speech recognition by the computer.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Gunawardana, Asela J. R., Acero, Alejandro, Li, Xiao
Primary Examiner(s)
Vo; Huyen X.

Application Number

US11/952,267
Publication Number

US 20090150153A1
Time in Patent Office

1,334 Days
Field of Search

704/251, 704/260, 704/9, 704/10, 704/231, 704/235, 704/244, 704/254, 704/255
US Class Current

704/254
CPC Class Codes

G10L 13/08   Text analysis or generation...

G10L 15/063   Training

G10L 15/187   Phonemic context, e.g. pron...

Grapheme-to-phoneme conversion using acoustic data

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Grapheme-to-phoneme conversion using acoustic data

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links