Apparatus and method for determining a likely word sequence from labels generated by an acoustic processor
First Claim
1. In a speech recognition system having an acoustic processor which generates a string of acoustic labels in response to speech input and a decoder which matches words in a vocabulary against generated labels in a string, a method of forming at least one likely sequence of words for a speech input, the method comprising the steps of:
- (a) generating a string of labels in response to a speech input;
(b) selecting words from a vocabulary as possible first words corresponding to labels at the beginning of the string;
(c) for a subject selected word,(i) locating a most likely boundary label interval in the string whereat the subject selected words has the highest probability of ending; and
(ii) evaluating a respective likelihood of the subject selected word at each label interval of the string up to and including the most likely boundary label interval;
(d) repeating step (c) for each selected word as the subject selected word; and
(e) classifying a given selected word as extendible if the likelihood at the particular label interval corresponding to the most likely boundary label interval thereof is within a predefined range of the highest likelihood for any selected word at said particular label interval.
1 Assignment
0 Petitions
Accused Products
Abstract
Continuous speech recognition is improved by use of a known vocabulary and context probabilities. First, the unknown utterance is analyzed as a sequence of phonemes, then each phoneme labelled to form a string of labels. The shortest label interval which is recognized as a word is assigned a storage stack where similar-sounding candidate words are stored. Multiple stack decoding, and liklihood envelope criteria for word path extension decisions, are further features of the system.
-
Citations
17 Claims
-
1. In a speech recognition system having an acoustic processor which generates a string of acoustic labels in response to speech input and a decoder which matches words in a vocabulary against generated labels in a string, a method of forming at least one likely sequence of words for a speech input, the method comprising the steps of:
-
(a) generating a string of labels in response to a speech input; (b) selecting words from a vocabulary as possible first words corresponding to labels at the beginning of the string; (c) for a subject selected word, (i) locating a most likely boundary label interval in the string whereat the subject selected words has the highest probability of ending; and (ii) evaluating a respective likelihood of the subject selected word at each label interval of the string up to and including the most likely boundary label interval; (d) repeating step (c) for each selected word as the subject selected word; and (e) classifying a given selected word as extendible if the likelihood at the particular label interval corresponding to the most likely boundary label interval thereof is within a predefined range of the highest likelihood for any selected word at said particular label interval. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. In a speech recognition system having an acoustic processor that generates labels selected from an alphabet thereof in response to speech input, a method of determining a likely word path from a plurality of word paths given a string of labels generated at successive intervals, the method comprising the steps of:
-
(a) assigning to each label interval a label stack; (b) determining for a subject word path (i) a boundary label interval at which the subject word path most likely ends (ii) a likelihood at each label prior to and including the boundary label thereof; (c) assigning the subject word path to the label stack corresponding to the boundary label thereof as an entry therein; (d) repeating steps (b) and (c) for each word path; (e) forming a likelihood envelope that includes a likelihood value at each label along the string of labels; (f) setting each likelihood value in the envelope to an initial value; (g) reducing each likelihood value in the likelihood envelope; (h) examining the word path entries in all label stacks longest first and, where a label stack has more than one entry, examining the word paths based on decreasing likelihood wherein said examining includes the step of; (i) classifying a word path as good if the likelihood at the label corresponding to the boundary label for a subject word path exceeds the reduced likelihood at the label corresponding to the boundary label; and (ii) repeating step (h)(i) for each word path as the subject word path; (j) after classifying a word path as good, up-dating the likelihood value for each label in the envelope as either (a) the current likelihood value in the envelope or (b) the likelihood value in the classified good word path, whichever is greater; and (k) after all word paths have been classified, selecting the shortest good word path as the word path to be extended or, if there is more than one good word path having the shortest length, selecting the shortest word path having the highest likelihood value at the boundary label thereof. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
-
-
16. In a speech recognition system having an acoustic processor which generates a string of acoustic labels in response to speech input, a method of selecting a likely sequence of words corresponding to a label string, the method comprising the steps of:
-
(a) generating labels at successive time intervals in response to spoken input; (b) identifying a plurality of word paths; (c) associating with a subject word path a plurality of likelihoods, each ith likelihood thereof corresponding to the likelihood of the subject word path given the first i generated labels; (d) repeating step (c) for each identified word path; (e) for a given path, determining a boundary label which has the highest likelihood of all generated labels of corresponding to the end of the latest word in said given path; (f) assigning the given path of step (e) to a stack corresponding to the determined boundary label; (g) repeating steps (e) and (f) for each word path as the given path; (j) forming a likelihood envelope with successive points therealong corresponding to successive label intervals including the step of initializing the envelope to a minimum reference value for each label interval; and (k) reducing the likelihoods of the envelope by a predefined level; (m) marking word paths as good or bad starting with the unmarked word path ending at the latest label interval and, if more than one word path ends at the latest label interval, starting with the word path having the highest likelihood value at the latest label interval wherein said marking includes the step of identifying a path as good if the likelihood value at the boundary label thereof exceeds the reduced likelihood corresponding to the boundary label and bad otherwise; (n) repeating step (m) for successive unmarked word paths that end with successively earlier boundary labels; (o) updating the envelope with the likelihoods of a word path when marked as good; (p) when all word paths have been marked, selecting the good path ending with the earliest boundary label and, if more than one good path ends with the earliest boundary label, selecting the word path having the highest likelihood value at the earliest boundary label; (q) extending the selected good path including the steps of; (i) appending follower words to the selected good path to form new paths, each including the selected good path with a respective follower word appended thereto; and (ii) removing the selected good path from the boundary label stack corresponding thereto; (r) repeating steps (j) through (q) until no good word paths remain; and (s) after no good word paths remain, selecting the complete path having the highest likelihood at the boundary label thereof as the word path corresponding to the generated labels.
-
-
17. In a speech recognition system having an acoustic processor that generates labels selected from an alphabet thereof in response to speech input, apparatus for determining a likely word path from a plurality of word paths given a string of labels generated at successive intervals wherein word paths corresponding to sentences are recognized, the apparatus comprising:
-
(a) means for assigning to each label interval a label stack; (b) means for determining for a subject word path (i) a boundary label interval at which the subject word path most likely ends and (ii) a likelihood at each label interval prior to and including the boundary label interval thereof; (c) means for assigning the subject word path to the label stack corresponding to the boundary label thereof as an entry therein; (d) said determining means and assigning means acting on each word path; (e) means for maintaining a complete-path stack which contains the most likely word path, if any, corresponding to a sentence; (f) means for forming a likelihood envelope as (i) the respective likelihoods for the word path contained in the complete-path stack or (ii), if there is no word path contained in the complete-path stack, a minimum reference likelihood at each label interval; (g) means for reducing the likelihoods in the likelihood envelope; (h) means for examining the word path entries in all label stacks longest first and, where a label stack has more than one entry, examining the word paths based on decreasing likelihood wherein said examining means includes; (i) means for classifying a word path as good if the likelihood at the label corresponding to the boundary label for a subject word path exceeds the reduced likelihood at the label corresponding to the boundary label; (ii) said classifying means acting on each word path as the subject word path; (j) means for up-dating, after classifying a word path as good, the likelihood value for each label in the envelope or (b) the likelihood value in the classified good word path, whichever is greater; and (k) means for selecting, after all word paths have been classified, the shortest good word path as the word path to be examined or, if there is more than one good word path having the shortest length, selecting the shortest word path having the highest likelihood value at the boundary label thereof.
-
Specification