Method for recognizing speech using linguistically-motivated hidden Markov models
First Claim
1. A method for estimating pronunciation probabilities in word pronunciation networks that incorporate dialectal variations in pronunciation for use in a speech recognition system wherein said word networks comprise a plurality of nodes, each node connected to its successive node by one or more arcs, each arc having associated with it a phone and a numerical variable for storing the pronunciation probability that arc is taken comprising the steps of:
- determining equivalence classes for a plurality of nodes in the word pronunciation networks by;
if context with surrounding nodes is not relevant phone choice at a node, classifying that node with nodes having similar phone choices into the same equivalence class; and
if context with surrounding nodes is important in determining phone choice at a node, classifying that node in an equivalence class with similar nodes having similar phone choices and sharing identical relevant contextual constraints, such that all nodes in the same equivalence class may share training samples for estimating pronunciation probabilities; and
using a set of training samples to estimate the pronunciation probabilities in the word pronunciation network such that training samples for a given word will contribute to the training of networks for all other words that have any nodes in an equivalence class with any of the nodes of that word.
0 Assignments
0 Petitions
Accused Products
Abstract
An automatic speech recognition methodology, wherein words are modeled as probabilistic networks of allophones, collects nodes in the probabilistic network into equivalence classes when those nodes have the same allophonic choices governed by the same phonological rules. The allophonic choices allow for representation of dialectic pronunciation variations between different speakers. Training data is shared among nodes in an equivalence class so that accurate pronunciation probabilities may be determined even for words for which there is only a limited amount of training data. A method is used to determine probabilities for each of a multitude of pronunciation models for each word in the vocabulary, based on automatic extraction of linguistic knowledge from sets of phonological rules, in order to robustly and accurately model dialectal variation.
286 Citations
7 Claims
-
1. A method for estimating pronunciation probabilities in word pronunciation networks that incorporate dialectal variations in pronunciation for use in a speech recognition system wherein said word networks comprise a plurality of nodes, each node connected to its successive node by one or more arcs, each arc having associated with it a phone and a numerical variable for storing the pronunciation probability that arc is taken comprising the steps of:
determining equivalence classes for a plurality of nodes in the word pronunciation networks by; if context with surrounding nodes is not relevant phone choice at a node, classifying that node with nodes having similar phone choices into the same equivalence class; and if context with surrounding nodes is important in determining phone choice at a node, classifying that node in an equivalence class with similar nodes having similar phone choices and sharing identical relevant contextual constraints, such that all nodes in the same equivalence class may share training samples for estimating pronunciation probabilities; and using a set of training samples to estimate the pronunciation probabilities in the word pronunciation network such that training samples for a given word will contribute to the training of networks for all other words that have any nodes in an equivalence class with any of the nodes of that word. - View Dependent Claims (2, 3, 4)
-
5. A method for building word pronunciation networks that incorporate dialectal variations in pronunciation useful for recognizing speech by a data processing computer comprising the steps of:
-
acquiring a set of baseform word models for each of the words in the vocabulary to be recognized; acquiring a set of phonological rules that define allowed pronunciation variations, at least one of said phonological rules specifying allophonic choices allowed in a particular context; applying the phonological rules to the baseform word models to obtain a set of word models that incorporate dialectal variations in pronunciation; and determining equivalence classes, each equivalence class being a grouping of nodes within said word models that incorporate dialectal variations in pronunciation wherein each node in an equivalence class represents the same possible pronunciation variations governed by the same phonological rules. - View Dependent Claims (6, 7)
-
Specification