Apparatus and method for recognizing spoken words
First Claim
Patent Images
1. Apparatus for receiving input spoken vocabulary words during a training phase of operation and for subsequently recognizing strings of received input spoken command words, comprising:
- feature extraction means for generating digital feature signals dependent on the features present in input words;
reference array formation means for forming and storing, for each vocabulary word, a time-dependent reference array dependent on the feature signals present during the vocabulary word as spoken during the training phase;
boundary detection means for detecting relatively long duration pauses between speech sounds as inter-string pauses from which string boundaries are determined, the string boundaries defining the beginning and end of a string of words, and for detecting relatively short duration pauses between speech sounds as inter-segment pauses from which speech segment boundaries are determined within a string of words;
candidate feature array formation means for forming and storing, for each command word candidate consisting of each segment and each sequence of up to a predetermined number of segments, a time-dependent candidate feature array dependent upon the feature signals present during said command word candidate;
array correlation means for comparing each candidate feature array with each reference array and for storing, for each command word candidate, the vocabulary word whose reference array results in the highest correlation and a record of the score of said highest correlation;
string matching means responsive to the stored correlation scores for determining the sequence of vocabulary words that yields the highest overall correlation score for the string; and
means for generating output indications of the determined sequence of vocabulary words.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech recognition technique is disclosed for recognizing words that are spoken at speeds that approach the speed of continuous speech.
To avoid the cost of a rigorous approach wherein all possible words of stored vocabulary are correlated against each group of speech samples, this invention uses relatively long duration inter-string pauses to detect string boundaries, and relatively short duration inter-segment pauses to determine speech segment boundaries which are used as a limited number of start or end points.
-
Citations
38 Claims
-
1. Apparatus for receiving input spoken vocabulary words during a training phase of operation and for subsequently recognizing strings of received input spoken command words, comprising:
-
feature extraction means for generating digital feature signals dependent on the features present in input words; reference array formation means for forming and storing, for each vocabulary word, a time-dependent reference array dependent on the feature signals present during the vocabulary word as spoken during the training phase; boundary detection means for detecting relatively long duration pauses between speech sounds as inter-string pauses from which string boundaries are determined, the string boundaries defining the beginning and end of a string of words, and for detecting relatively short duration pauses between speech sounds as inter-segment pauses from which speech segment boundaries are determined within a string of words; candidate feature array formation means for forming and storing, for each command word candidate consisting of each segment and each sequence of up to a predetermined number of segments, a time-dependent candidate feature array dependent upon the feature signals present during said command word candidate; array correlation means for comparing each candidate feature array with each reference array and for storing, for each command word candidate, the vocabulary word whose reference array results in the highest correlation and a record of the score of said highest correlation; string matching means responsive to the stored correlation scores for determining the sequence of vocabulary words that yields the highest overall correlation score for the string; and means for generating output indications of the determined sequence of vocabulary words. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. Apparatus for receiving input spoken vocabulary words during a training phase of operation and for subsequently recognizing strings of received input spoken command words, comprising:
-
feature extraction means for generating digital feature signals dependent on the features present in input words; reference array formation means for forming and storing, for each vocabulary word, a time-dependent reference array dependent on the feature signals present during the vocabulary word spoken during the training phase; a feature buffer memory for storing feature signals which occur over a period of time; candidate feature array formation means, responsive to the stored feature signals, for forming, for each command word candidate consisting of a portion of the speech of a received string, a time dependent command word candidate feature array dependent on the feature signals present during said command word candidate; a first-in-first-out feature array memory stack for storing said command word candidate feature arrays; array correlation means responsive to feature arrays in said feature array memory stack for comparing, in order, each candidate feature array with each reference array to obtain the vocabulary word whose reference array results in the highest correlation; a correlation result memory for storing a record of the vocabulary words which result in the highest correlation with each candidate feature array, and for storing a record of the score of said highest correlation; string matching means responsive to the correlation scores and vocabulary words stored in said correlation result memory for determining the sequence of vocabulary words that yields the highest correlation score for the string; and means responsive to said string matching means for generating output indications of the determined sequence of vocabulary words; said array formation means, array correlation means, and string matching means being operative to function one at a time on a priority basis, with said array formation means having a higher operational priority than said array correlation means, and said array correlation means having a higher operational priority than said string matching means. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. A method for receiving input spoken vocabulary words during a training phase of operation and for subsequently recognizing strings of received input spoken command words, comprising the steps of:
-
generating digital feature signals dependent on the features present in input words; forming and storing, for each vocabulary word, a time-dependent reference array dependent on the feature signals present during the vocabulary word as spoken during the training phase; detecting relatively long duration pauses between speech sounds as inter-string pauses from which string boundaries are determined, the string boundaries defining the beginning and end of a string of words, and detecting relatively short duration pauses between speech sounds as inter-segment pauses from which speech segment boundaries are determined within a string of words; forming and storing, for each command word candidate consisting of each segment and each sequence of up to a predetermined number of segments, a time-dependent candidate feature array dependent upon the feature signals present during said command word candidate; comparing each candidate feature array with each reference array and storing, for each command word candidate, the vocabulary word whose reference array results in the highest correlation and a record of the score of said highest correlation; determining, using the stored correlation scores and words, the sequence of vocabulary words that yields the highest overall correlation score for the string; and generating output indications of the determined sequence of vocabulary words. - View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37, 38)
-
Specification