Speech recognition system
First Claim
1. In a speech recognition system which represents each vocabulary word or a portion thereof by at least one sequence of phones wherein each phone corresponds to a respective phone machine, each phone machine having associated therewith (a) a plurality of transitions and (b) actual label output probabilities, each actual label probability representing the probability that a subject label is generated at a given transition in the phone machine, a method of performing an acoustic match between phones and a string of labels produced by an acoustic processor in response to a speech input, the method comprising the steps of:
- forming simplified phone machines which includes the step of replacing by a single specific value the actual label probabilities for a given label at all transitions at which the given label may be generated in a particular phone machine; and
determining the probability of a phone generating the labels in the string based on the simplified phone machine corresponding thereto.
1 Assignment
0 Petitions
Accused Products
Abstract
Speech words are recognized by first recognizing each spectral vector identified by a label (feneme), then identifying the word by matching the string of labels against phones using simplified phone machines based on label and transition probabilities and Merkov chains. In one embodiment, a detailed acoustic match word score is combined with an approximate acoustic match word score to provide a total word score for a subject word. In another embodiment, a polling word score is combined with an acoustic match word score to provide a total word score for a subject word. The acoustic models employed in the acoustic matching may correspond, alternatively, to phonetic elements or to fenemes. Fenemes represent labels generated by an acoustic processor in response to a spoken input. Apparatus and method for determining word scores according to approximate acoustic matching and for determining word scores according to a polling methodology are disclosed.
338 Citations
97 Claims
-
1. In a speech recognition system which represents each vocabulary word or a portion thereof by at least one sequence of phones wherein each phone corresponds to a respective phone machine, each phone machine having associated therewith (a) a plurality of transitions and (b) actual label output probabilities, each actual label probability representing the probability that a subject label is generated at a given transition in the phone machine, a method of performing an acoustic match between phones and a string of labels produced by an acoustic processor in response to a speech input, the method comprising the steps of:
-
forming simplified phone machines which includes the step of replacing by a single specific value the actual label probabilities for a given label at all transitions at which the given label may be generated in a particular phone machine; and determining the probability of a phone generating the labels in the string based on the simplified phone machine corresponding thereto. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method of performing an acoustic match of words in a vocabulary against a string of labels which represent a speech input, the method comprising the steps of:
-
entering as inputs to the phone machine of a given phone (a) the string of labels and (b) a respective start-time distribution for the given phone; and generating (a) an end-time distribution and (b) a match value of the given phone relative to the entered labels based on the generated end-time distribution; the match value generated for each particular phone corresponding to the probability that said each particular phone produced the entered string of labels. - View Dependent Claims (15, 16, 17)
-
-
18. Apparatus for matching words with a string of incoming labels in a pattern recognition system, the apparatus comprising:
-
at least one phone machine; each phone machine being characterized by having (a) a plurality of states and transition paths between states, (b) transition probabilities T(i→
j) representing the probability of state Sj given a current state Si where Si and Sj may be the same state or different states, and (c) actual label probabilities wherein each actual label probability p(yk) indicates the probability that a label yk is produced by a given phone machine at a given transition from one state to a subsequent state where k is a label identifying notation; andeach phone machine including (a) means for assigning to each yk in said each phone machine a single value p'"'"'(yk) and (b) means for replacing each actual output probability p(yk) at each transition in a given phone machine by the single value p'"'"'(yk) assigned to the corresponding yk. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A method of determining at least one word in a vocabulary having the highest probability of having generated a given string of incoming labels that were produced in response to a speech input, the method comprising the steps of:
-
characterizing each word as a sequence of phonetic elements, wherein each phonetic element has (a) a start-time distribution of probabilities qn corresponding to respective successive start times tn, (b) a plurality of states between which transitions occur, (c) a plurality of transition probabilities, each indicating the probability that a given transition in a given phonetic element occurs, (d) a plurality of actual label probabilities, each actual output probability indicating the probability that a particular phonetic element generates a particular label at a particular transition in the particular phonetic element; and forming an approximate match for a subject word including the steps of; replacing all actual probabilities associated with a given label generated by a given phonetic element at any transition therein with a corresponding specific replacement value; and determining for one phonetic element after another in the subject word the probability Φ
n of a phonetic element ending at a respective one of a plurality of successive end times tn as a function of start-time distribution, the probability of the phonetic element generating a label string of each of various lengths, and the replacement value p'"'"'(yk) for each respective label yk that is to be generated by the phonetic element to produce the incoming string of labels. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
-
-
40. A method of performing an approximate acoustic match between incoming labels and words in a vocabulary where each word is represented by a sequence of Markov model phone machines, the Markov model of each phone machine including states and transitions therebetween, the method comprising the steps of:
-
determining the probability of being at a subject state at a given time, which includes the steps of (a) identifying each previous state that has a transition which leads to the subject state and determining the respective probability of each such previous state;
(b) recognizing, for each such previous state, a value representing the probability of the label that must be generated at the transition between each such previous state and the current state in order to conform to the label string; and
(c) combining the probability of each previous state and the respective value representing the label probability to provide a subject state probability over a corresponding transition;wherein, for each label in a given phone, setting the probability thereof to be a specific value throughout the given phone; determining the overall probability of being at the subject state from the subject state probabilities over all transitions leading thereto; and combining the overall probabilities for a given phone in a word to determine the likelihood of the given phone producing the string of labels.
-
-
41. In a speech recognition system, a method of measuring the likelihood of a word corresponding to a spoken input where the word is from a vocabulary of words, the method comprising the steps of:
-
(a) generating a string of labels in response to a spoken input, each label (i) being from an alphabet of labels and (ii) representing a respective sound type; (b) determining label votes, each label vote representing the likelihood that a respective label is produced when a given word is uttered; and (c) for a subject word, accumulating a label vote for each of at least some of the labels generated in the string; the accumulated label votes providing information indicative of the likelihood of the subject word. - View Dependent Claims (42, 43, 44, 45, 46, 47, 48, 49, 50)
-
-
51. In a speech recognition system having (i) an acoustic processor which generates acoustic labels, (ii) a vocabulary of words each of which is represented by a word model comprising a sequence of Markov model phone machines, and (iii) a set of trained statistics indicating the label output probabilities and transition probabilities of each phone machine in a word model, a method of selecting likely candidate words from the vocabulary comprising the steps of:
-
(a) for a subject word from the vocabulary, determining a respective label vote for each label wherein a label vote for a given label represents the likelihood of the subject word producing the given label; and (b) for a given string of labels generated by the acoustic processor in response to an unknown spoken input, combining the label votes for the subject word for labels generated in the string. - View Dependent Claims (52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63)
-
-
64. A speech recognition method of selecting likely words from a vocabulary of words wherein each word is represented by a sequence of at least one probabilistic finite state phone machine and wherein an acoustic processor generates acoustic labels in response to a spoken input, the method comprising the steps of:
(a) forming a first table in which each label in the alphabet provides a vote for each word in the vocabulary, each label vote for a subject word indicating the likelihood of the subject word producing the label providing the vote. - View Dependent Claims (65, 66, 67, 68)
-
69. A speech recognition apparatus for selecting likely words from a vocabulary of words wherein each word is represented by a sequence of at least one probabilistic finite state phone machine and wherein an acoustic processor generates acoustic labels in response to a spoken input, the apparatus comprising:
-
(a) means for forming a first table in which each label in the alphabet provides a vote for each word in the vocabulary, each label vote for a subject word indicating the likelihood of the subject word producing the label providing the vote; and (b) means for forming a second table in which each label is assigned a penalty for each word in the vocabulary, the penalty assigned to a given label for a given word being indicative of the likelihood of the given label not being produced according to the model for the given word. - View Dependent Claims (70)
-
-
71. A method of evaluating the likelihood of a word corresponding to a speech input in a speech-recognition system comprising the steps of:
-
for a subject word in a vocabulary of words, generating a first word score representing the subject word likelihood based on an acoustic match first algorithm; for the subject word, generating a second word score based on a second independent algorithm which differs from the first algorithm; and forming a total word score for the subject word from at least the first word score and the second word score. - View Dependent Claims (72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83)
-
-
84. A method of measuring the likelihood that a word in a vocabulary of words corresponds to a speech input in a speech recognition system, wherein the system generates a string of labels where each label represents a sound class and is selected from an alphabet of predefined labels in response to speech input, the method comprising the steps of:
-
calculating each of a plurality of word scores for a subject word, each word score being calculated based on (a) a string of generated labels and (b) an independent algorithm; and combining the plurality of word scores to provide a total word score for the subject word. - View Dependent Claims (85, 86, 87, 88, 89, 90, 91)
-
-
92. In a speech recognition system, apparatus for measuring word likelihood for words in a vocabulary, wherein the system generates a string of labels where each label represents a sound type and is selected from an alphabet of predefined labels in response to speech input, the apparatus comprising:
-
means for calculating each of a plurality of word scores for a subject word from (a) a string of generated labels and (b) an algorithm which is independent with regard to each algorithm associated with another calculated word score; and means for combining the plurality of word scores to provide a total word score for the subject word. - View Dependent Claims (93, 94, 95, 96)
-
-
97. In a speech recognition system in which each word is represented as a sequence of phones and in which an acoustic processor generates a string of successive labels in response to the utterance of speech, wherein each label corresponds to one of an alphabet of pre-defined sound types, a machine-implemetable method of determining a vote of each label of the alphabet for each vocabulary word, the method comprising the steps of:
-
generating, in the acoustic processor, a string of labels in response to the uttering of a known script of sequential phones; evaluating a count indicative of the number of times each label in the label alphabet is generated for a given phone in response to the utterance of the known script; and repeating the evaluating step for each label as applied to each phone uttered in the known script.
-
Specification