Syntactic continuous speech recognizer
First Claim
1. Apparatus for recognizing an utterance as one of a plurality of reference word sequences comprising:
- means (205) for storing a set of signals each representative of the acoustic features of a reference word;
means (203) responsive to said utterance for generating a signal representative of the acoustic features of the utterance;
means (411) for generating a set of first signals defining the plurality of sequences, each sequence being a series of reference words;
means (427) for generating a succession of second signals identifying the successive word positions of the plurality of sequences;
means (209, 425,
427) responsive to the set of first signals, the utterance feature signals and the reference word feature signals for producing a set of third signals each representative of the correspondence between said utterance and one of said sequences; and
means (430) responsive to said third signals for selecting the sequence having the closest correspondence to said utterance;
characterized in thatsaid third signal producing means comprises;
means (361) operative for each sequence in each word position identified by said second signals for storing a fourth signal representative of the word position utterance segment endpoint of said sequence;
means (351,383,411) operative for each sequence in each word position jointly responsive to said first signals and stored fourth signal of the preceding word position of the sequence for selecting the utterance feature signals beginning at the preceding word position utterance segment endpoint of the sequence and the current word position reference word feature signals of the sequence; and
means (209) jointly responsive to said sequence selected utterance feature signals and the sequence selected reference word feature signals for concurrently generating a fourth signal representative of the endpoint of the current word position utterance segment corresponding to the selected reference word, and a signal representative of the correspondence between the selected utterance segment feature signals from the preceding word position utterance segment endpoint for the sequence and the current word position selected reference word feature signals for the sequence.
0 Assignments
0 Petitions
Accused Products
Abstract
The accuracy of segmenting an utterance into words is improved by the use of sequence-defining signals. An utterance is recognized as one of a plurality of reference word sequences in an arrangement wherein a set of signals is generated to define the syntax, i.e., word arrangements, of the sequences. Each sequence corresponds to a selected series of reference words. A signal is generated to identify the successive word positions of the sequences. Responsive to the sequence defining signals, the utterance, and the reference words, a set of signals is produced, each representing the correspondence between said utterance and one of the sequences. The sequence having the closest correspondence to the utterance is selected. The sequence correspondence signal generation includes selecting, in each identified word position, the word position reference word for each sequence and the portion of the utterance corresponding thereto responsive to said sequence defining signals, and generating a signal representative of the acoustic correspondence between each sequence word position reference word and its selected utterance portion.
57 Citations
15 Claims
-
1. Apparatus for recognizing an utterance as one of a plurality of reference word sequences comprising:
-
means (205) for storing a set of signals each representative of the acoustic features of a reference word; means (203) responsive to said utterance for generating a signal representative of the acoustic features of the utterance; means (411) for generating a set of first signals defining the plurality of sequences, each sequence being a series of reference words; means (427) for generating a succession of second signals identifying the successive word positions of the plurality of sequences; means (209, 425,
427) responsive to the set of first signals, the utterance feature signals and the reference word feature signals for producing a set of third signals each representative of the correspondence between said utterance and one of said sequences; andmeans (430) responsive to said third signals for selecting the sequence having the closest correspondence to said utterance; characterized in that said third signal producing means comprises; means (361) operative for each sequence in each word position identified by said second signals for storing a fourth signal representative of the word position utterance segment endpoint of said sequence; means (351,383,411) operative for each sequence in each word position jointly responsive to said first signals and stored fourth signal of the preceding word position of the sequence for selecting the utterance feature signals beginning at the preceding word position utterance segment endpoint of the sequence and the current word position reference word feature signals of the sequence; and means (209) jointly responsive to said sequence selected utterance feature signals and the sequence selected reference word feature signals for concurrently generating a fourth signal representative of the endpoint of the current word position utterance segment corresponding to the selected reference word, and a signal representative of the correspondence between the selected utterance segment feature signals from the preceding word position utterance segment endpoint for the sequence and the current word position selected reference word feature signals for the sequence. - View Dependent Claims (2, 3)
-
-
4. In a continuous speech recognizer that includes means for storing a set of first signals defining a plurality of predetermined sequences, each sequence being a series of reference words;
- means for storing a set of acoustic feature signals for each reference word;
means for generating a succession of second signals identifying the successive word positions of said sequences;
means for receiving an utterance; and
means for generating a set of acoustic feature signals representative of said received utterance;
a method for recognizing the received utterance as one of the plurality of reference word sequences comprising the steps of;producing a set of third signals each representative of the acoustic correspondence between said utterance and one of said sequences responsive to the set of first signals, the utterance acoustic feature signals and the reference word acoustic feature signals; and selecting the sequence having the closest acoustic correspondence to said utterance responsive to said third signals; characterized in that said third signal producing step comprises; storing a fourth signal representing the utterance segment endpoint for each sequence of reference words in each word position identified by said second signals; for each sequence in each successive word position identified by said second signals, selecting the utterance acoustic feature signals beginning at the preceding word position utterance segment endpoint for the sequence, and the acoustic feature signals of the current identified word position reference word for the sequence jointly responsive to the first signals and the stored fourth signal of the preceding word position for the sequence; and concurrently generating a fourth signal representative of the endpoint of the current word position utterance segment corresponding to the selected reference word, and a signal representative of the acoustic correspondence between the current word position selected reference word for the sequence and the utterance segment from the preceding word position endpoint, said concurrent acoustic correspondence and fourth signal generation being jointly responsive to the current word position selected reference word acoustic feature signals for the sequence and the acoustic feature signals of the current word position selected utterance segment for the sequence. - View Dependent Claims (5, 6)
- means for storing a set of acoustic feature signals for each reference word;
-
7. Apparatus for recognizing an utterance as one of a plurality of predetermined reference word sequences comprising:
-
means for storing a set of signals each representative to the acoustic features of a reference word; means responsive to said utterance for generating a signal representative of the acoustic features thereof; means for generating a set of first signals defining the plurality of reference word sequences, each first signal including a first state code, a second state code, and a reference word code connected from the first state code to the second state code and each sequence being a selected plurality of state connected reference word codes; means for generating a succession of second signals identifying the successive word positions of the plurality of predetermined sequences; means for generating a set of final state coded signals each identifying the final word position of one of said predetermined sequences; means jointly responsive to said first signals, said utterance feature signals and said reference word feature signals, for producing a set of third signals each representative of the acoustic correspondence between said utterance and one of said predetermined reference word sequences; and means responsive to said third signals for selecting the sequence having the closest acoustic correspondence to said utterance; said third signal producing means including; means operative in each successive identified word position for storing a fourth signal representative of an utterance segment endpoint for each predetermined reference word sequence in said word position; means operative in each successive identified word position responsive to said first signals and to the stored fourth signal of the sequence in the preceding word position for selecting the current identified word position reference word for each sequence and for selecting the utterance feature signals beginning at the utterance segment endpoint of the sequence in the preceding word position; means jointly responsive to the feature signals of the selected reference word and the feature signals of the utterance segment beginning at the stored endpoint for the immediately preceding word position of the sequence for concurrently generating a fourth signal representative of the endpoint of the current identified word position utterance segment corresponding to the selected reference word, and a signal representative of the correspondence between the feature signals of the sequence selected reference word and the feature signals of utterance segment from the immediately preceding word position endpoint; and means for combining the current word position sequence reference word correspondence signal with the correspondence signal of the preceding word positions of the same sequence to form a cumulative correspondence signal for each sequence in each word position. - View Dependent Claims (8, 9)
-
-
10. Apparatus for recognizing a continuous speech pattern as one of a plurality of predetermined reference word sequences comprising:
-
means for generating a set of first signals defining the plurality of predetermined sequences, each sequence defining signal including a first state signal, a second state signal and a reference word code linked from said first state to said second state and each sequence being a series of selected state connected reference word codes terminating in an end state; means for generating a succession of second signals identifying the successive reference word positions of said sequences; means for generating a signal representative of each sequence end state; means responsive to each reference word for generating and storing a set of signals representative of the acoustic features of said reference word; means responsive to said continuous speech pattern for generating signals representative of the acoustic features of said speech pattern; means jointly responsive to said first signals, said reference word feature signal sets, and the feature signals of said continuous speech pattern for producing a set of third signals each representative of the correspondence between said continuous speech pattern and one of the reference word sequences; and means responsive to said third signals for identifying the continuous speech pattern as the closest corresponding reference word sequence; said third signal producing means comprising; means operative in each word position for storing a fourth signal representing the word position speech pattern segment endpoint for each reference word sequence; means operative for each sequence in each successive word position identified by said second signals jointly responsive to said first signals and the stored fourth signal of the preceding word position of the sequence for selecting the speech pattern feature signals beginning at the preceding word position speech pattern segment endpoint of the sequence, and for selecting the current identified word position reference word feature signals of each sequence; means jointly responsive to the feature signal set of the sequence selected reference word and the feature signals of the continuous speech pattern segment beginning at the stored speech pattern endpoint of the immediately preceding word position of the sequence for concurrently generating a fourth signal representative of the endpoint of the current identified word position continuous speech pattern segment corresponding to the selected reference word and a signal representative of the correspondence between the feature signals of the sequence selected reference word and the feature signals of the continuous speech pattern segment from the immediately preceding word position speech pattern endpoint of the sequence, said segment generated fourth signal in the current word position being stored in said endpoint storing means as the preceding word position endpoint signal for the next occurring word position. - View Dependent Claims (11, 12)
-
-
13. In a continuous speech pattern recognition circuit that includes means for storing a set of first signals syntactically defining a plurality of predetermined reference word sequences, each sequence being a selected series of reference words and each first signal comprising a reference word code and a pair of state codes identifying the position of the reference word in said sequences;
- means for generating a succession of second signals identifying the successive word positions of said sequences;
means for storing a set of acoustic feature signals for each reference word;
means for receiving a continuous speech pattern; and
means for generating a set of acoustic feature signals corresponding to said speech pattern;a method for recognizing the speech pattern as one of said plurality of predetermined reference word sequences comprising the steps of; producing a set of third signals each representative of the acoustic correspondence between said speech pattern and one of said reference word sequences jointly responsive to said first signals, the acoustic feature signal sets of said reference word and the acoustic feature signal set for said speech pattern; and responsive to the third signals for the sequences, identifying the speech pattern as the reference word sequence having the closest acoustic correspondence to said utterance; said third signal producing step comprising; for every sequence in each word position, storing a fourth signal representative of the speech pattern segment endpoint corresponding to the sequence in said word position; for each sequence in each successive word position identified by said second signals, selecting the speech pattern acoustic feature signals beginning at the preceding word position speech pattern endpoint for the sequence and selecting the acoustic feature signals of the current word position reference word of the sequence, jointly responsive to the first signals and the stored fourth signal of the preceding word position for the sequence; and concurrently generating a fourth signal representative of the endpoint of the current word position speech segment corresponding to the selected reference word and a signal representative of the acoustic correspondence between the current word position selected reference word for the sequence and the speech pattern segment beginning at the preceding word position endpoint jointly responsive to the current word position selected reference word acoustic feature signals for the sequence and the acoustic feature signals of the selected speech pattern segment for the sequence. - View Dependent Claims (14, 15)
- means for generating a succession of second signals identifying the successive word positions of said sequences;
Specification