Speech recognition system based on word state duration and/or weight
First Claim
1. Speech recognition apparatus comprising:
- means for converting audio speech into electroninc signals;
means for diverting the incoming speech up along a time line into an array of sequential word states based on the content of the speech, each word state having a time period;
means for classifying each word state as one of a plurality of classifications based on the content of the speech during the corresponding time period;
means for determining the duration of the time period corresponding to each word state within the array of incoming word states and for using the determined durations to provide an array of durational values corresponding to the word states of the incoming word state array;
means for providing a plurality of stored templates representing the vocabulary of the speech recognition apparatus;
each template being comprised of two arrays;
the first array being a sequence of stored word states each state being classified as one of said plurality of classifications;
the second array being a sequence of values indicating the duration of a corresponding stored word state;
first comparing means for comparing the classifications of incoming word states with said first array of each of said templates to locate matching states;
second comparing means for comparing the duration of each incoming word state with the duration of the corresponding stored word state only where the classifications of the word states match; and
means responsive to both of said comparing means for determining which of said templates is the closest match to said array of incoming word states and said array of durational values.
8 Assignments
0 Petitions
Accused Products
Abstract
A low cost, speaker independent, limited vocabulary, word recognizing microcomputer. The microcomputer divides each spoken word into a series of word states, determines the length of each state and classifies each state as fricative, vowel-like, or silent. The incoming speech pattern, in the form of two arrays: an array of classified word states and an array of associated word lengths is then compared sequentially with a series of templates, defining the limited vocabulary stored in the microcomputer'"'"'s memory. Where the states match, an error score is generated based on the difference in lengths between the template lengths and the word state lengths. Provision is made for recognizing a spoken word as a template word even when the array of states representing the spoken word is not identical to an array of states in any of the template words. This permits recognition of the same word by the microcomputer even when the word is spoken in substantially different ways.
64 Citations
33 Claims
-
1. Speech recognition apparatus comprising:
-
means for converting audio speech into electroninc signals; means for diverting the incoming speech up along a time line into an array of sequential word states based on the content of the speech, each word state having a time period; means for classifying each word state as one of a plurality of classifications based on the content of the speech during the corresponding time period; means for determining the duration of the time period corresponding to each word state within the array of incoming word states and for using the determined durations to provide an array of durational values corresponding to the word states of the incoming word state array; means for providing a plurality of stored templates representing the vocabulary of the speech recognition apparatus;
each template being comprised of two arrays;
the first array being a sequence of stored word states each state being classified as one of said plurality of classifications;
the second array being a sequence of values indicating the duration of a corresponding stored word state;first comparing means for comparing the classifications of incoming word states with said first array of each of said templates to locate matching states; second comparing means for comparing the duration of each incoming word state with the duration of the corresponding stored word state only where the classifications of the word states match; and means responsive to both of said comparing means for determining which of said templates is the closest match to said array of incoming word states and said array of durational values. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 26, 27)
-
-
13. A method of recognizing speech comprising:
-
providing an array of classified states representing a spoken word; providing a template state array composed of a sequence of classified states representing a stored vocabulary word; sequentially comparing the classifications of the states of said template state array with the classifications of the states of said spoken word state array until a mismatch is found; after a mismatch is found comparing the classification of the last compared spoken word state sequentially with the classification of the next state of the template state array until a match is found. - View Dependent Claims (14, 15, 16)
-
-
17. Speech recognition apparatus comprising:
-
means for converting incoming speech into a word state array, and for classifying each state in said array as one of a plurality of classifications; a plurality of stored templates, each template including an array of classified states; means for comparing the states of said word state array with the states of said stored template array and generating an error value, for increasing said error value based on a first speech parameter if the state classifications match, and for increasing said error value based on a second speech parameter if the state classifications do not match.
-
-
18. A method for speech recognition comprising:
-
dividing incoming speech into an array of classified speech states; providing a template state array composed of classified states representing a stored vocabulary word; providing a weight for at least some states of said template state array, said weight being related to the states'"'"' importance in recognition of the vocabulary word; comparing the incoming speech state array with the template state array to determine which states of said template state array are missing from said incoming speech state array, and generating a measure of the degree of matching between the classifications of the states of the incoming speech state array and the template state array, said measure being a function of the weights corresponding to template states which are found to be missing in the incoming array.
-
-
19. A method for speech recognition comprising:
-
dividing incoming speech into an array of classified speech states; measuring the length of each speech state; providing a template representing a stored vocabulary word, said template including a first array of classified word states and a second array including a sequence of values indicating the length of corresponding word states; comparing said incoming speech state array with said template state array; and generating a measure of the degree of matching between said incoming speech state array and said template word state array, said measure being a function of the differences in length between said incoming speech states and matching template word states. - View Dependent Claims (28, 29)
-
-
20. Speech recognition apparatus comprising:
-
means for dividing a spoken word into equal time portions, and for classifying each portion as either fricative-like, vowel-like, or silent; means for designating a group of time portions as an incoming state when a predetermined number proximately located time portions have the same classifications, and for classifying the state in accordance with the predominant classification of the time portions which make up the state; a stored template representing a vocabulary word including an array of states, each state being classified as either fricative-like, vowel-like, or silent; means for sequentially comparing the spoken word array with the template array to determine whether the classifications of the states match; means permitting recognition of the spoken word as the word represented by the template even where the number of states in the spoken word array is different from the number of states in the template array.
-
-
21. In a programmed computer system for recognizing human speech, a data structure for comparing incoming speech patterns with stored templates, comprising:
-
a. first means in said data structure responsive to incoming speech for dividing the speech into an array of states, for classifying each state as one of a plurality of classifications and for storing first coded signals representing the array of classified states; b. second means in said data structure responsive to the length of each said incoming speech states for storing second coded signals representing an array of values, equal in number to the number of states in the incoming speech state array, said values indicating the length of each corresponding speech state; c. third means in said data structure storing third coded signals indicative of recognition templates representing stored vocabulary words, each template including an array of classified states, an array of length values one value corresponding to each of the classified states of the template, and an array of weighting values, one value corresponding to each of the classified states of the template and said weighting values being assigned based on the importance of the particular state to recognition of the word represented by the template; d. fourth means in said data structure for comparing the coded signals representing the classifications of the incoming speech states in the order of the state sequence with the classifications of the template states; e. fifth means in said data structure for storing coded signals representing an error value indicating the degree of matching between the incoming speech state array and the template state array; f. sixth means in said data structure for determining the absolute value of the difference between the length stored by said second means and the length stored by said third means and adding this value to the error value when the states being compared have the same classifications but different lengths; and g. seventh means in said data structure for adding the weighting value stored by the third means to the error value when the states being compared have different classifications. - View Dependent Claims (30)
-
-
22. Speech recognition apparatus comprising:
-
a circuit which converts a spoken word into an array of classified states; a circuit storing a template state array composed of a sequence of classified states representing a stored vocabulary word; a circuit which acts to sequentially compare the classifications of the states of the template state array with the classifications of the states of the spoken word state array until a mismatch is found, and, after a mismatch is found, to compare the classification of the last compared spoken word state sequentially with the classifications of the next states of the template state array until a match is found.
-
-
23. Speech recognition apparatus comprising:
-
a circuit which divides incoming speech into an array of classified speech states; a circuit storing a template state array composed of classified states representing a stored vocabulary word; a circuit storing a weighting array including a weighting value corresponding to states of said template state array, said weighting value being related to the corresponding states'"'"' importance in recognition of the vocabulary word; a circuit which acts to compare the incoming speech state array with the template state array; a circuit which generates a measure of the degree of matching between the classifications of the states of the incoming speech state array and the template state array, said measure being a function of the weights corresponding to template states which are found to be missing in the incoming array.
-
-
24. Speech recognition apparatus comprising:
-
a circuit which divides incoming speech into an array of classified speech states and measures the length of each speech state; a circuit storing a template representing a vocabulary word of the speech recognition apparatus, said template including a first array of classified word states and a second array composed of a sequence of values indicating the length of corresponding stored word states; a circuit which acts to compare the incoming speech state array with the template state array; a circuit which generates a measure of the degree of matching between the incoming speech state array and the template word state array, said measure being a function of the differences in length between the incoming speech states and the corresponding template word states of states that have matching classifications. - View Dependent Claims (31)
-
-
25. A method for speech recognition comprising:
-
dividing a spoken word into equal time portions and classifying each as either fricative-like, vowel-like or silent; designating a group of time portions as an incoming state when a predetermined number of approximately located time portions have the same classification, and classifying the states in accordance with the predominant classification of the time portions which make up the state; providing a template representing a vocabulary word, said template including an array of states, each state being classified as either fricative-like, vowel-like, or silence; comparing the spoken word state array sequentially with the template state array to determine whether the classifications of the states match; and permitting recognition of the spoken word as the word represented by the template even where the number of states in the spoken word array is different from the number of states in the template array.
-
-
32. Speech recognition apparatus comprising:
-
means for converting audio speech into electronic signals; means for dividing the incoming speech into an array of incoming word states, and for classifying each word state as one of a plurality of classifications; means for determining the duration of each word state within the array of incoming word states and for using the determined durations to provide an array of durational values corresponding to the word states of the incoming word state array; means for providing a plurality of stored templates representing the vocabulary of the speech recognition apparatus;
each template being comprised of three arrays;
the first array being a sequence of stored word states, each state being classified as one of said plurality of classifications;
the second array being a sequence of values indicating the duration of a corresponding word state;
the third array being a series of weighting values, one value being assigned to each stored word state based on the importance of the word state to recognition of the word;means for sequentially comparing the classifications of the states of the incoming word state array with the classifications of the template state array to locate matching states; means responsive to said second array and said comparing means for generating an error value representative of the degree of matching between the incoming word state and durational arrays and each of the templates, said error value generating means further including means for increasing the error value by the absolute value of the difference between the values of said durational arrays corresponding to the particular states being compared when the classifications of the states being compared are the same; means for increasing the error value by an amount equal to the weighting value of said weighting array corresponding to the particular state of said template state array being compared when the classifications of the states being compared are not the same; and means, using said error value, the determine which of said templates is the closest match to said array of incoming word states and said array of durational values.
-
-
33. A method of recognizing speech comprising:
-
providing an array of classified states representing a spoken word; providing a template state array composed of a sequence of classified states representing a stored vocabulary word; sequentially comparing the classifications of the states of said template state array with the classifications of the states of said spoken word state array until a mismatch is found; after a mismatch is found, comparing the classification of the last compared spoken work state sequentially with the classification of the next state of the template state array until a match is found; generating an error value representing the degree of match between the spoken word state array and the template state array; increasing said error value by a preselected weighting function when the classification of the state in the spoken word array does not match a compared state in the template array.
-
Specification