Continuous speech pattern recognizer
First Claim
1. Apparatus for recognizing a speech pattern as a string of predetermined reference words comprising:
- means for storing a set of signals representative of the time frame sequence of acoustic features of each reference word, said time frame sequence having a beginning frame and an ending frame;
means for producing a set of signals representative of the time frame sequence of acoustic features of said speech pattern, means responsive to the speech pattern acoustic feature signals and the reference word acoustic feature signals for generating a plurality of reference word strings, and means for identifying the speech pattern as one of the generated reference word strings, characterized in that said reference word string generating means comprises;
means for generating a set of signals for identifying successive reference word levels, means for assigning a segment of said speech pattern to each successive level, means operative in each successive level for time registering the level speech pattern segment feature signals with the reference word feature signals to produce level time registration speech pattern segment endframe signals and time registration correspondence signals for said reference words, and means responsive to the time registration endframe and time registration correspondence signals of the levels for selecting reference word strings.
1 Assignment
0 Petitions
Accused Products
Abstract
This speech recognizer concatenates a string of reference isolated-words for comparison with the unknown string of connected-words. The invention includes a level-building (LB) algorithm, "level" implying a location in a sequence of words. A constrained endpoint dynamic-time-warp algorithm, in which the slope of the warping function is restricted between 1/2 and 2, is used to find the best alignment between an unknown continuous-word test pattern, and a concatenated sequence of L reference patterns. Properties of the LB algorithm include: modification of the references; back-track decision logic; heuristic selection of multiple candidates, and syntax constraints. As a result, the processing required is less than two-level dynamic-program-matching and sampling algorithms.
-
Citations
50 Claims
-
1. Apparatus for recognizing a speech pattern as a string of predetermined reference words comprising:
- means for storing a set of signals representative of the time frame sequence of acoustic features of each reference word, said time frame sequence having a beginning frame and an ending frame;
means for producing a set of signals representative of the time frame sequence of acoustic features of said speech pattern, means responsive to the speech pattern acoustic feature signals and the reference word acoustic feature signals for generating a plurality of reference word strings, and means for identifying the speech pattern as one of the generated reference word strings, characterized in that said reference word string generating means comprises;
means for generating a set of signals for identifying successive reference word levels, means for assigning a segment of said speech pattern to each successive level, means operative in each successive level for time registering the level speech pattern segment feature signals with the reference word feature signals to produce level time registration speech pattern segment endframe signals and time registration correspondence signals for said reference words, and means responsive to the time registration endframe and time registration correspondence signals of the levels for selecting reference word strings. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 45, 46, 47)
- means for storing a set of signals representative of the time frame sequence of acoustic features of each reference word, said time frame sequence having a beginning frame and an ending frame;
-
11. A method for recognizing a speech pattern as a string of predetermined reference words comprising the steps of:
- storing a set of signals representative of the time frame sequence of acoustic features of each reference word, said sequence having a beginning frame and an ending frame;
producing a set of signals representative of the time frame sequence of acoustic features of the speech pattern;
generating at least one reference word string responsive to the speech pattern acoustic feature signals and the reference word acoustic feature signals; and
identifying the speech pattern as one of the generated reference word strings characterized in that said reference word string generation comprises;
producing a set of signals identifying successive reference word levels;
assigning a segment of the speech pattern to each successive level;
for each level, time registering the level speech pattern segment feature signals with the feature signals of the reference words to produce level time registration speech pattern segment endframe signals and time registration correspondence signals for said reference words, and selecting reference word strings responsive to the time registration speech pattern endframe and time registration correspondence signals of the levels. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 48, 49, 50)
- storing a set of signals representative of the time frame sequence of acoustic features of each reference word, said sequence having a beginning frame and an ending frame;
-
21. A speech analyzer for recognizing a speech pattern as a string of predetermined reference words comprising:
- means for storing a set of signals representative of the time frame sequence of acoustic features of each reference word from a beginning frame to an ending frame, means for producing a set of signals representative of the time frame sequence of acoustic features of the speech pattern from a beginning frame to a final frame;
means responsive to the feature signals of said reference words and said speech pattern for generating at least one reference word string; and
means for identifying the speech pattern as one of said generated reference word strings, said reference word string generating means comprising;
means for generating a set of signals to identify successive levels of said reference words;
means for assigning a segment of said speech pattern to each successive level, means operative at each successive level responsive to the reference word and speech pattern segment feature signals for dynamically time warping feature signals of each reference word with the feature signals of the speech pattern segment assigned to the level to produce signals representative of time registration path speech pattern endframes for said reference words and signals representative of the correspondence of the reference word and speech pattern segment feature signals on said time registration path; and
means responsive to the time registration path speech pattern endframe and correspondence signals of the levels for selecting strings of reference words. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
- means for storing a set of signals representative of the time frame sequence of acoustic features of each reference word from a beginning frame to an ending frame, means for producing a set of signals representative of the time frame sequence of acoustic features of the speech pattern from a beginning frame to a final frame;
-
33. A method for recognizing a speech pattern as a string of predetermined reference words comprising the steps of:
- storing a set of signals representative of the time frame sequence of acoustic features of each reference word from a beginning frame to an ending frame;
producing a set of signals representative of the time frame sequence of acoustic features of the speech from a beginning frame to a final frame;
generating at least one reference word string responsive to the feature signals of the reference words and the feature signals of the speech pattern; and
identifying the speech pattern as one of said generated reference word strings;
the reference word string generating step comprising;
generating a set of signals to identify successive levels of said reference words, assigning a segment of the speech pattern to each successive reference word level, at each reference word level dynamically time warping the feature signals of each reference word with the feature signals of the speech pattern segment assigned to the level to produce signals representative of time registration path speech pattern endframes for said reference word and signals representative of the correspondence of the reference word and speech pattern segment feature signals along the time registration paths, and selecting strings of reference words responsive to the time registration path speech pattern endframe and correspondence signals of the levels. - View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
- storing a set of signals representative of the time frame sequence of acoustic features of each reference word from a beginning frame to an ending frame;
Specification