Grapheme-to-phoneme conversion using acoustic data
First Claim
1. A method performed by a computer comprising storage and a processor, the method comprising:
- modeling acoustic data, a phoneme sequence, a grapheme sequence and an alignment between the phoneme sequence and the grapheme sequence to provide a graphoneme model stored in the storage; and
retraining, by the processor, a grapheme to phoneme model usable in speech recognition by optimizing the graphoneme model using acoustic data stored in the storage.
2 Assignments
0 Petitions
Accused Products
Abstract
Described is the use of acoustic data to improve grapheme-to-phoneme conversion for speech recognition, such as to more accurately recognize spoken names in a voice-dialing system. A joint model of acoustics and graphonemes (acoustic data, phonemes sequences, grapheme sequences and an alignment between phoneme sequences and grapheme sequences) is described, as is retraining by maximum likelihood training and discriminative training in adapting graphoneme model parameters using acoustic data. Also described is the unsupervised collection of grapheme labels for received acoustic data, thereby automatically obtaining a substantial number of actual samples that may be used in retraining. Speech input that does not meet a confidence threshold may be filtered out so as to not be used by the retrained model.
-
Citations
17 Claims
-
1. A method performed by a computer comprising storage and a processor, the method comprising:
-
modeling acoustic data, a phoneme sequence, a grapheme sequence and an alignment between the phoneme sequence and the grapheme sequence to provide a graphoneme model stored in the storage; and retraining, by the processor, a grapheme to phoneme model usable in speech recognition by optimizing the graphoneme model using acoustic data stored in the storage. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer system comprising:
-
a grapheme to phoneme model stored in storage of the computer; a recognizer coupled to the grapheme to phoneme model to recognize input speech as a corresponding grapheme sequence, the recognizer executed by a processor of the computer; and a retraining mechanism coupled to the recognizer that retrains the grapheme to phoneme model into a retrained grapheme to phoneme model based upon acoustic data and associated graphemes collected by a recognition system, the retraining mechanism executed by the processor of the computer. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. One or more tangible computer-readable storage media storing information to enable a computer to perform a process, the process comprising:
-
modeling, by the computer, acoustic data, a phoneme sequence, a grapheme sequence and an alignment between the phoneme sequence and the grapheme sequence to provide a graphoneme model stored by the computer; retraining, by the computer, a grapheme to phoneme model by optimizing the graphoneme model using acoustic data stored in the storage; and using the grapheme to phoneme model to perform speech recognition by the computer.
-
Specification