Continuous speech recognition system
First Claim
Patent Images
1. Apparatus for recognizing continuous speech comprising:
- means (105) for storing signals representative of the acoustic features of a set of reference words;
means (103) responsive to an unknown utterance for producing a sequence of signals representative of the acoustic features of the utterance;
means (110,120,130,140,160) responsive to the reference word acoustic feature signals and the utterance acoustic feature signals for generating at least one reference word series as a candidate for said utterance;
0 Assignments
0 Petitions
Accused Products
Abstract
Recognition of continuous speech by comparison with prestored isolated words may be confused by the merging together of spoken adjacent words (coarticulation). Improved recognition is attained by generating overlap-words, e.g., words whose first phoneme is the end phoneme of the preceding word in a string of words. The reference candidate series of overlap-words is transformed under dynamic time warping so as to time-match the utterance series of overlap-words.
-
Citations
28 Claims
-
1. Apparatus for recognizing continuous speech comprising:
-
means (105) for storing signals representative of the acoustic features of a set of reference words; means (103) responsive to an unknown utterance for producing a sequence of signals representative of the acoustic features of the utterance; means (110,120,130,140,160) responsive to the reference word acoustic feature signals and the utterance acoustic feature signals for generating at least one reference word series as a candidate for said utterance; - View Dependent Claims (3, 4, 6, 8, 10, 12, 13, 14, 16, 17, 20, 21, 25, 28)
-
-
2. and means (170) responsive to said at least one reference word series candidate and said utterance acoustic feature signals for identifying said utterance as one of said reference word series candidates;
-
characterized in that said reference word series candidate generating means (110,120,130,140,160) comprises; means (401,340) for generating a signal for identifying successive word position intervals; means (307,309,311) operative in each successive word position interval for forming reference word series partial candidates including means (307,309,311,320,345) responsive to each reference word series partial candidate of the preceding word position for determining for each reference word a plurality of utterance segments beginning within a predetermined range of the utterance segment endpoint of said preceding word position reference word series partial candidate and corresponding to the reference word feature signals, said range overlapping the preceding word position partial candidate utterance segment endpoint; means (307,320) responsive to each reference word feature signals and the feature signals of the corresponding determined utterance segments for forming a signal representative of the similarity between said reference word and said utterance segments; means (309,325) responsive to said similarity signals for selecting reference words having a prescribed similarity to their corresponding utterance segments; and means (311) for combining said selected reference words with the reference word series partial candidates of the preceding word position to form at least one reference word series partial candidate for said word position interval.
-
-
5. characterized in that
said utterance identifying means (170) comprises means (335) responsive to the endpoints of all reference word series partial candidates of a word position interval being within a predetermined range of the utterance endpoint for generating a selection signal; -
means (365,370,387,389,460) responsive to said selection signal for generating a sequence of feature signals corresponding to each reference word series candidate; means (311) jointly responsive to each reference word series candidate feature signal sequence and the utterance feature signal sequence for producing a signal representative of the correspondence of the utterance to said reference word series candidate and means (313) responsive to said correspondence signals for identifying the reference word series candidate having the closest correspondence to said utterance. - View Dependent Claims (9)
-
-
7. characterized in that
the word position reference word combining means (311) further comprises means (510) for generating a signal representative of the similarity between each reference word series partial candidate of the word position and the utterance portion corresponding thereto; -
and said reference word series candidate identifying means (170) comprises means (503,505,510) for combining the similarity and correspondence signals for each reference word series candidate; and means (311,313) responsive to said combined similarity and correspondence signals for determining the reference word series candidate most closely corresponding to the utterance.
-
-
11. A method for recognizing continuous speech comprising
storing signals representative of the acoustic features of a set of reference words; -
producing a sequence of signals representative of the acoustic features of an unknown utterance; generating at least one reference word series as a candidate for said utterance responsive to the reference word and utterance feature signals; and identifying the utterance as one of said reference word series candidates; characterized in that said reference word series candidate generating step comprises; generating a signal identifying successive word position intervals for said utterance; in each identified word position interval, forming reference word series partial candidates including determining for each reference word series partial candidate of the preceding word position interval a plurality of utterance segments for each reference word beginning within a predetermined range of the utterance segment endpoint of the reference word series partial candidate of the preceding word position and corresponding to the reference word feature signals, said range overlapping the preceding word position partial candidate utterance segment; forming a signal representative of the similarity between each reference word and the corresponding determined utterance segments responsive to the reference word feature signals and said determined utterance segment feature signals; selecting reference words having a prescribed similarity to their corresponding utterance segments responsive to said similarity signals; and combining each selected reference word with the reference word series partial candidates of the preceding word position to form at least one reference word series partial candidate for said word position interval. - View Dependent Claims (15, 19, 23, 27)
-
-
18. Apparatus for recognizing an utterance as a series of predetermined reference words comprising:
-
means for storing a set of signals each representative of the acoustic features of a predetermined reference word; means responsive to the utterance for generating a sequence of signals representative of the acoustic features of the utterance; means jointly responsive to the utterance acoustic feature signals and reference word feature signals for producing a set of reference word series candidates for the utterance; and means responsive to the reference word series candidates for identifying the utterance; said reference word series candidate producing means includes means for generating a signal for identifying successive word positions for the utterance; means operative in each identified word position for generating reference word series partial candidates for said word position comprising means operative for each reference word series partial candidate of the preceding word position responsive to the set of reference word feature signals and the utterance feature signals for determining an utterance segment best corresponding to the feature signals of each reference word and beginning within a predetermined range of the utterance portion endpoint of the preceding word position reference word series partial candidate, said range overlapping the preceding word position partial candidate utterance portion; means for selecting reference words having a prescribed similarity to their corresponding utterance segments and means for combining said selected reference words with the reference word series partial candidates of the preceding word position to form reference word series partial candidates for said word position. - View Dependent Claims (22, 26)
-
-
24. A method for recognizing an utterance as a series of predetermined reference words comprising the steps of storing signals representative of the acoustic features of a set of predetermined reference words;
-
generating a sequence of signals representative of the acoustic features of the utterance; producing at least one series of reference words as a candidate for the utterance responsive to the reference word feature signals and the utterance acoustic feature signals; and identifying the utterance as one of said reference word series candidates; wherein the reference word series candidate generating step comprises identifying successive word positions for the utterance; in each identified word position, generating reference word series partial candidate including determining, for each reference word series partial candidate of the preceding word position and each reference word, an utterance segment in the current word position overlapping the utterance segment of the preceding word position reference word series partial candidate that best corresponds to the reference word acoustic feature signals; and combining reference words having a prescribed similarity to their corresponding utterance segments with the partial reference word series candidates of the preceding word position to form reference word series partial candidates for the current word position.
-
Specification