Speech recognition system utilizing vocabulary model preselection
First Claim
1. In a speech recognition system which compares an unknown speech segment represented by a fine sequence of frames with a vocabulary of models represented by respective fine sequences of states, said states being selected from a limited collection of predetermined states, thereby to determine the best matches;
- a computer implemented method of preselecting candidate models for accurate comparison, said method comprising;
for each model to be considered, subsampling the corresponding fine sequence of states to obtain a respective coarse sequence comprising a predetermined number of states;
subsampling said fine sequence of frames to obtain a coarse sequence comprising a predetermined number of frames, said predetermined numbers together defining a matrix having frame positions along one axis and state positions along another axis, there being a preselected region within said matrix which is examined by said method;
for each state in said limited collection, determining for each state position in said matrix the input frame which provides the best match with that state, irrespective of the frame determined in connection with any adjacent state position and considering and examining only frames which lie within said preselected region, a measure of the match being stored in a table;
calculating, using said table, for each model to be considered a value representing the overall match of said coarse sequence of frames with the respective coarse sequence of states;
preselecting for accurate comparison those models with the better overall match values as so calculated.
11 Assignments
0 Petitions
Accused Products
Abstract
Preliminary screening of vocabulary models is provided by successively applying two different high speed distance measuring functions which provide progressively increasing measurement accuracy. Both distance measuring functions utilize subsampled representations of the unknown speech segment and the vocabulary models. The initial screening function achieves very high speed by eliminating certain usual time warping constraints and by precalculating a table of distance values which can be used for all vocabulary models. The second screening function yields improved accuracy in spite of possible endpointing errors by comparing extra frames, preceding and following the presumed unknown word, with noise models appended to each vocabulary model.
40 Citations
9 Claims
-
1. In a speech recognition system which compares an unknown speech segment represented by a fine sequence of frames with a vocabulary of models represented by respective fine sequences of states, said states being selected from a limited collection of predetermined states, thereby to determine the best matches;
- a computer implemented method of preselecting candidate models for accurate comparison, said method comprising;
for each model to be considered, subsampling the corresponding fine sequence of states to obtain a respective coarse sequence comprising a predetermined number of states; subsampling said fine sequence of frames to obtain a coarse sequence comprising a predetermined number of frames, said predetermined numbers together defining a matrix having frame positions along one axis and state positions along another axis, there being a preselected region within said matrix which is examined by said method; for each state in said limited collection, determining for each state position in said matrix the input frame which provides the best match with that state, irrespective of the frame determined in connection with any adjacent state position and considering and examining only frames which lie within said preselected region, a measure of the match being stored in a table; calculating, using said table, for each model to be considered a value representing the overall match of said coarse sequence of frames with the respective coarse sequence of states; preselecting for accurate comparison those models with the better overall match values as so calculated. - View Dependent Claims (2, 3)
- a computer implemented method of preselecting candidate models for accurate comparison, said method comprising;
-
4. In a speech recognition system which compares an unknown input speech segment with a vocabulary of models and in which input speech is encoded as a fine sequence of frames and means are provided for identifying the likely start and finish endpoints of words in said fine sequence, said models being represented by correspondingly fine sequences of states;
- a computer implemented method of preselecting candidate models for accurate comparison, said method comprising;
for each model to be considered, subsampling the corresponding fine sequence of frames to obtain a respective coarse sequence comprising a predetermined number of states; subsampling said fine sequence of frames between said endpoints to obtain a coarse sequence comprising a predetermined number of frames, said predetermined numbers together defining a matrix having frame positions along one axis and state positions along another axis; comparing a preselected number of frames preceding said start endpoint with a preselected noise model thereby to precalculate cost values for entry into said matrix at different frame position locations; comparing a preselected number of frames following said finish endpoint with a preselected noise model thereby to precalculate cost values for exit from said matrix at different frame position locations; for each model to be considered, determining a best match path across said matrix including the cost of entry to and exit from the matrix at different frame position locations, and scoring the model on the basis of that best path; selecting, for accurate comparison with the input speech segment, those models with the best scores thusly obtained. - View Dependent Claims (5)
- a computer implemented method of preselecting candidate models for accurate comparison, said method comprising;
-
6. In a speech recognition system which compares an unknown speech segment represented by a fine sequence of frames with a vocabulary of models represented by respective fine sequences of states thereby to determine the best matches;
- a computer implemented method of preselecting candidate models for accurate comparison, said method comprising;
for each model to be considered, subsampling the corresponding fine sequence of states to obtain a respective coarse sequence comprising a predetermined number of states; subsampling said fine sequence of frames to obtain a coarse sequence comprising a predetermined number of frames, said predetermined numbers together defining a matrix having frame positions along one axis and state positions along another axis, there being a preselected region within said matrix which is examined by said method; determining for each state position in said matrix the input frame which provides the best match with that state, irrespective of the frame determined in connection with any adjacent state position and considering and examining only frames which lie within said preselected region, and providing a measure of the degree of match; combining the measures for the several state positions thereby to obtain a value representing the overall match of said coarse sequence of frames with the respective coarse sequence of states; preselecting for accurate comparison those models with the better overall match values as so calculated. - View Dependent Claims (7)
- a computer implemented method of preselecting candidate models for accurate comparison, said method comprising;
-
8. In a speech recognition system which compares an unknown speech segment represented by a fine sequence of frames with a vocabulary of models represented by respective fine sequences of states, said vocabulary being partitioned into acoustically similar groups of model with one model of each group being representative of the group thereby to determine the best matches;
- a computer implemented method of preselecting candidate models for accurate comparison, said method comprising;
for each model, subsampling the corresponding fine sequence of states to obtain a respective coarse sequence comprising a predetermined number of states; subsampling said fine sequence of frames to obtain a coarse sequence comprising a predetermined number of frames, said predetermined numbers together defining a matrix having frame positions along one axis and state positions along another axis, there being a preselected region within said matrix which is examined by said method; providing a first distance measuring function which determines for each state position in said matrix the input frame which provides the best match with that state, considering and examining only frames which lie within said preselected region, and provides a measure of the degree of match; combining the measures for the several state positions thereby to obtain a first value representing the overall match of said coarse sequence of frames with the respective coarse sequence of states; providing a second distance measuring function which determines a connected path across said matrix and calculates a second value representing the overall match of said coarse sequence of frames with the respective coarse sequence of states; applying said first distance measuring function to the group representative models; selecting the better scoring representative models and applying to the selected models said second distance measuring function thereby to identify a reduced number of better scoring groups; applying said first distance measuring function to the members of said better scoring groups; selecting the better scoring member models and applying to the selected member models said second distance measuring function thereby to preselect a reduced number of member models for accurate comparison with said unknown speech segment.
- a computer implemented method of preselecting candidate models for accurate comparison, said method comprising;
-
9. In a speech recognition system which compares an unknown speech segment represented by a fine sequence of frames with a vocabulary of models represented by respective fine sequences of states thereby to determine the best matches;
- a computer implemented method of selecting candidate models, said method comprising;
for each model to be considered, subsampling the corresponding fine sequence of states to obtain a respective coarse sequence comprising a predetermined number of states; subsampling said fine sequence of frames to obtain a coarse sequence comprising a predetermined number of frames, said predetermined numbers together defining a matrix having frame positions along one axis and state positions along another axis, there being a preselected region within said matrix which is examined by said method; determining for each state position in said matrix the input frame which provides the best match with that state, irrespective of the frame determined in connection with any adjacent state position and considering and examining only frames which lie within said preselected region, and providing a measure of the degree of match; combining the measures for the several state positions thereby to obtain a value representing the overall match of said coarse sequence of frames with the respective coarse sequence of states; selecting (for accurate comparison) those models with the better overall match values as so calculated; and for only those models with the better overall match values, comparing the fine sequence of frames with the respective fine sequence of states thereby to identify at least one recognition candidate model.
- a computer implemented method of selecting candidate models, said method comprising;
Specification