Method and apparatus for the automatic determination of phonological rules as for a continuous speech recognition system
First Claim
1. A method for automatically separating vocalizations of language components into a plurality of groups representing pronunciations of the language components in respectively different contexts, said method comprising the steps of:
- A) processing a training text and vocalizations representing the training text to obtain a plurality of samples representing the language components of said vocalizations;
B) selecting, from among the plurality of samples, a set of samples representing respective instances of a selected language component in the vocalizations;
C) annotating each of said selected samples with a context indicator, representing at least one language component in a contextual relationship with the selected sample, to produce annotated samples;
D) separating the selected samples into respectively different leaf groups based on the respective context indicators of said annotated samples, each of said leaf groups representing a pronunciation of said selected language component in a respectively different context.
1 Assignment
0 Petitions
Accused Products
Abstract
A continuous speech recognition system includes an automatic phonological rules generator which determines variations in the pronunciation of phonemes based on the context in which they occur. This phonological rules generator associates sequences of labels derived from vocalizations of a training text with respective phonemes inferred from the training text. These sequences are then annotated with their pheneme context from the training text and clustered into groups representing similar pronunciations of each phoneme. A decision tree is generated using the context information of the sequences to predict the clusters to which the sequences belong. The training data is processed by the decision tree to divide the sequences into leaf-groups representing similar pronunciations of each phoneme. The sequences in each leaf-group are clustered into sub-groups representing respectively different pronunciations of their corresponding phoneme in a give context. A Markov model is generated for each sub-group. The various Markov models of a leaf-group are combined into a single compound model by assigning common initial and final states to each model. The compound Markov models are used by a speech recognition system to analyze an unknown sequence of labels given its context.
298 Citations
24 Claims
-
1. A method for automatically separating vocalizations of language components into a plurality of groups representing pronunciations of the language components in respectively different contexts, said method comprising the steps of:
-
A) processing a training text and vocalizations representing the training text to obtain a plurality of samples representing the language components of said vocalizations; B) selecting, from among the plurality of samples, a set of samples representing respective instances of a selected language component in the vocalizations; C) annotating each of said selected samples with a context indicator, representing at least one language component in a contextual relationship with the selected sample, to produce annotated samples; D) separating the selected samples into respectively different leaf groups based on the respective context indicators of said annotated samples, each of said leaf groups representing a pronunciation of said selected language component in a respectively different context. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. In an automatic speech recognition system, a method for associating vocalizations of a continuously spoken sequence of words with respective language components, comprising the steps of:
-
generating a sampled data signal representing the vocalizations of said continuously spoken sequence of words; A) associating a first language component with a first set of samples of said sampled data signal; B) associating a second set of samples of said sampled data signal with a second language component; C) accessing a decision means with the first language component as a context indicator to define a probabilistic model to be used to relate the second set of samples to the second language component, the probabilistic model defined by said decision means representing a distinct pronunciation of a selected language component in terms of a context provided for the selected language component; D) calculating, from said defined probabilistic model, a likelihood that the second language component corresponds to the second set of samples; E) repeating steps A) through D) for a plurality of second language components; and F) associating the one of the second language component and the plurality of second language components having the greatest likelihood with said second set of samples. - View Dependent Claims (9, 10)
-
-
11. In a speech recognition system, a method for dividing a set of sample values representing respective vocalizations into first and second groups of sample values using an automatically generated test question, the method comprising the steps of:
-
A) annotating each of the sample values in said set with an indicator of a first attribute of said sample values; B) further annotating each of the sample values in said set with an indicator of a second attribute of said sample values, said set of further annotated samples having a predetermined entropy value measured with respect to said second attribute indicator; and C) generating the test question, in terms of the first attribute indicator of said further annotated samples, wherein, the test question is applied to divide the further annotated sample values into said first and second groups, wherein said first and second groups of samples have a combined entropy value, measured with respect to said second attribute indicator, that is less than said predetermined entropy value. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. In a voice recognition system, a method for grouping a plurality of vocalizations, represented by respective sequences of samples having respective sample values, into clusters comprising the steps of:
-
A) assigning each sequence of samples to a respectively different prototype cluster; B) calculating an expected frequency value for each sample value in each prototype cluster; C) statistically comparing the expected frequency value of each sample value of each prototype cluster to the expected frequencies of all other sample values in all other prototype clusters to generate a plurality of statistical difference values, one for each pair of clusters; D) combining pairs of prototype clusters that exhibit a statistical difference value which is less than a threshold value to generate new prototype clusters. - View Dependent Claims (18, 19)
-
-
20. Apparatus for automatically separating vocalizations of language components into a plurality of groups representing pronunciations of the language components in respectively different contexts, comprising:
-
sampling means for converting a training text and vocalizations of the training test into a plurality of samples representing the language components of said vocalizations; means for selecting, from among the plurality of samples, a set of samples representing respective instances of a selected language component in the vocalizations; processing means including means for annotating each of said selected samples with a context indicator representing at least one language component in a contextual relationship with the selected sample, to produce annotated samples; means for separating the selected samples into respectively different leaf groups based on the respective context indicators of said annotated samples, each of said leaf groups representing a pronunciation of said selected language component in a respectively different context. - View Dependent Claims (21, 22, 23, 24)
-
Specification