Methods and apparatus for automatic generation of multiple pronunciations from acoustic data
First Claim
1. A method of automatically generating two or more phonetic baseforms from a spoken utterance representing a word, the method comprising the steps of:
- transforming the spoken utterance into a stream of acoustic observations;
generating two or more strings of subphone units, wherein each string of subphone units represents a string of subphone units substantially maximizing a log-likelihood of the stream of acoustic observations, and wherein the log-likelihood is computed as a weighted sum of a transition score associated with a transition model between the subphone units and of an acoustic score associated with a separate context-dependent acoustic model of the subphone units; and
converting the two or more strings of subphone units into two or more phonetic baseforms.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus for automatically deriving multiple phonetic baseforms of a word from a speech utterance of this word are provided in accordance with the present invention. In one embodiment, a method of automatically generating two or more phonetic baseforms from a spoken utterance representing a word includes the steps of: transforming the spoken utterance into a stream of acoustic observations; generating two or more strings of subphone units, wherein each string of subphone units represents a string of subphone units substantially maximizing a log-likelihood of the stream of acoustic observations, and wherein the log-likelihood is computed as a weighted sum of a transition score associated with a transition model and of an acoustic score associated with an acoustic model; and converting the two or more strings of subphone units into two or more phonetic baseforms.
50 Citations
21 Claims
-
1. A method of automatically generating two or more phonetic baseforms from a spoken utterance representing a word, the method comprising the steps of:
-
transforming the spoken utterance into a stream of acoustic observations; generating two or more strings of subphone units, wherein each string of subphone units represents a string of subphone units substantially maximizing a log-likelihood of the stream of acoustic observations, and wherein the log-likelihood is computed as a weighted sum of a transition score associated with a transition model between the subphone units and of an acoustic score associated with a separate context-dependent acoustic model of the subphone units; and converting the two or more strings of subphone units into two or more phonetic baseforms. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. Apparatus for automatically generating two or more phonetic baseforms from a spoken utterance representing a word, the apparatus comprising:
at least one processor operative to;
(i) transform the spoken utterance into a stream of acoustic observations;
(ii) generate two or more strings of subphone units, wherein each string of subphone units represents a string of subphone units substantially maximizing a log-likelihood of the stream of acoustic observations, and wherein the log-likelihood is computed as a weighted sum of transition score associated with a transition model between the subphone units and of an acoustic score associated with a separate context-dependent acoustic model of the subphone units; and
(iii) convert the two or more strings of subphone units into two or more phonetic baseforms.- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
17. An article of manufacture for automatically generating two or more phonetic baseforms from a spoken utterance representing a word, comprising a machine readable medium containing one or more programs which when executed implement the steps of:
transforming the spoken utterance into a stream of acoustic observations; generating two or more strings of subphone units, wherein each string of subphone units represents a string of subphone units substantially maximizing a log-likelihood of the stream of acoustic observations, and wherein the log-likelihood is computed as a weighted sum of a transition score associated with a transition model between the subphone units and of an acoustic score associated with a separate context-dependent acoustic model of the subphone units; and converting the two or more strings of subphone units into two or more phonetic baseforms. - View Dependent Claims (18, 19, 20)
-
21. A computing device having a speech recognition engine comprising apparatus for automatically generating two or more phonetic baseforms from a spoken utterance representing a word, the apparatus operative to:
- (i) transform the spoken utterance into a stream of acoustic observations;
(ii) generate two or more strings of subphone units, wherein each string of subphone units represents a string of subphone units substantially maximizing a log-likelihood of the stream of acoustic observations, and wherein the log-likelihood is computed as a weighted sum of a transition score associated with a transition model between the subphone units and of an acoustic score associated with a separate context-dependent acoustic model of the subphone units;
(iii) convert the two or more strings of subphone units into two or more phonetic baseforms;
(iv) add the two or more phonetic baseforms to a recognition lexicon associated with the speech recognition engine.
- (i) transform the spoken utterance into a stream of acoustic observations;
Specification