Method for recognizing speech using linguistically-motivated hidden Markov models
First Claim
1. A method for modeling word-initial acoustic cross-word effects for use in generating word pronunciation networks from a set of pronunciation rules and a training database on a data processing apparatus, wherein said training database comprises representations of speech signals of words being pronounced in a continuous speech manner, said speech signal representations being linked to textual representations of said words, wherein word pronunciation networks include textual representations of words and corresponding phonetic networks, said modeling method comprising the steps, for every word network w, of:
- for every initial arc ai, setting a variable pi equal to a phone label on said arc ai ; and
for every phone label pj in a phonetic inventory,a) counting a number of occurrences c in said training database of the word w preceded by any word ending with said phone pj ;
thereafterb) if c is greater than a preselected threshold,i) adding an initial arc ak to said word network w with phone label pi which connects to a common "to-node"[as said arc ai ;
thereafterii) constraining said arc ak to only connect to arcs with label pj ; and
iii) constraining said arc ai not to connect to arcs having phone label pj.
1 Assignment
0 Petitions
Accused Products
Abstract
An automatic speech recognition methodology takes advantage of linguistic constraints wherein words are modeled as probabilistic networks of phonetic segments (herein phones), and each phone is represented as a context-independent hidden Markov phone model mixed with a number of context-dependent phone models. Recognition is based on use of methods to design phonological rule sets based on measures of coverage and overgeneration of pronunciations which achieves high coverage of pronunciations with compact representations. Further, a method estimates probabilities of the different possible pronunciations of words. A further method models cross-word coarticulatory effects. In a specific embodiment of the system, a specific method determines the single most-likely pronunciation of words. In further specific embodiments of the system, methods generate speaker-dependent pronunciation networks.
302 Citations
6 Claims
-
1. A method for modeling word-initial acoustic cross-word effects for use in generating word pronunciation networks from a set of pronunciation rules and a training database on a data processing apparatus, wherein said training database comprises representations of speech signals of words being pronounced in a continuous speech manner, said speech signal representations being linked to textual representations of said words, wherein word pronunciation networks include textual representations of words and corresponding phonetic networks, said modeling method comprising the steps, for every word network w, of:
-
for every initial arc ai, setting a variable pi equal to a phone label on said arc ai ; and for every phone label pj in a phonetic inventory, a) counting a number of occurrences c in said training database of the word w preceded by any word ending with said phone pj ;
thereafterb) if c is greater than a preselected threshold, i) adding an initial arc ak to said word network w with phone label pi which connects to a common "to-node"[as said arc ai ;
thereafterii) constraining said arc ak to only connect to arcs with label pj ; and iii) constraining said arc ai not to connect to arcs having phone label pj. - View Dependent Claims (2, 3)
-
-
4. A method for modeling word-final acoustic cross-word effects for use in generating word pronunciation networks from a set of pronunciation rules and a training database on a data processing apparatus, wherein said training database comprises representations of speech signals of words being pronounced in a continuous speech manner, said speech signal representations being linked to textual representations of said words, wherein word pronunciation networks include textual representations of words and corresponding phonetics networks, said modeling method comprising the steps, for every word network w, of:
-
for every final arc ai, setting a variable pi equal to a phone label on said arc ai ; and for every phone label pj in a phonetic inventory, a) counting a number of occurrences c in said training database of the word w followed by any word beginning with said phone pj ;
thereafterb) if c is greater than a preselected threshold, i) adding a final arc ak to said word network w with phone label pi which connects to a common "to-node" as said arc ai ;
thereafterii) constraining said arc ak to only connect to arcs with label pj ; and iii) constraining said arc ai not to connect to arcs having phone label pj. - View Dependent Claims (5, 6)
-
Specification