Feature-based speech recognizer having probabilistic linguistic processor providing word matching based on the entire space of feature vectors
First Claim
1. A feature-based speech recognizer having a probabilistic linguistic processor providing word matching based on the entire space of segment-based feature vectors, comprising:
- a segmenter responsive to acoustic evidence O in form of frames of speech-coded data representative of the speech to be recognized and operative (1) to parse said acoustic evidence into plural segments in such a way that each segment represents another way that said acoustic evidence may be partitioned framewise into segments, where all segments together define a segment space that accounts for all of the ways said frames of said acoustic evidence may be partitioned framewise, and (2) to combine said segments into plural segmentations S in such a way that each segmentation represents another way that said segments of said segment space may be combined segmentwise to account for all of said acoustic evidence;
a feature extractor, coupled to the segmenter and responsive to the frames of the acoustic evidence, and operative to extract, for each segment of a possible segmentation, a feature vector X having predetermined dimensions defined by linguistic units of an acoustic model that is representative of the presence of those linguistic units in the frames of the acoustic evidence underlying each such segment;
a classifier operable over the entire space of feature vectors, coupled to the feature extractor and to the segmenter, responsive to the extracted feature vectors, and to the segmentations S, and operative to classify the segment-based feature vectors X of every segmentation in terms of the different sequences of one or more predetermined linguistic units to provide a joint likelihood, P(XY.linevert split.SW), that is a measure of how well each feature vector X of every segmentation fits all the sequences of one or more predetermined linguistic units, and in such a way as to take into account, for every segmentation, the feature vectors Y of all of the other segments of the segment space belonging to other segmentations; and
a probabilistic word matcher, coupled to the classifier operable over the entire space of feature vectors, and operative to search the feature vectors classified over the entire space of segment-based feature vectors in terms of the ways the different sequences of one or more linguistic units fit the competing segmentations to find that word string that best matches the acoustic evidence.
2 Assignments
0 Petitions
Accused Products
Abstract
A feature-based speech recognizer having a probabilistic linguistic processor provides word matching based on the entire space of feature vectors. In this manner, the errors and inaccuracies associated with the heretofore known feature-based speech recognizers, which provided word matching on less than the entire space of feature vectors, are overcome, thereby resulting in improved-accuracy speech recognition. The word matching may be on feature vectors computed either from segments or from landmarks or from both segments and landmarks. For word matching on segment-based feature vectors, acoustic likelihoods may be normalized by extra-acoustic likelihoods defined by at least one extra-acoustic ("not" or "anti") model. Context-dependent and context-independent acoustic models may be employed.
-
Citations
25 Claims
-
1. A feature-based speech recognizer having a probabilistic linguistic processor providing word matching based on the entire space of segment-based feature vectors, comprising:
-
a segmenter responsive to acoustic evidence O in form of frames of speech-coded data representative of the speech to be recognized and operative (1) to parse said acoustic evidence into plural segments in such a way that each segment represents another way that said acoustic evidence may be partitioned framewise into segments, where all segments together define a segment space that accounts for all of the ways said frames of said acoustic evidence may be partitioned framewise, and (2) to combine said segments into plural segmentations S in such a way that each segmentation represents another way that said segments of said segment space may be combined segmentwise to account for all of said acoustic evidence; a feature extractor, coupled to the segmenter and responsive to the frames of the acoustic evidence, and operative to extract, for each segment of a possible segmentation, a feature vector X having predetermined dimensions defined by linguistic units of an acoustic model that is representative of the presence of those linguistic units in the frames of the acoustic evidence underlying each such segment; a classifier operable over the entire space of feature vectors, coupled to the feature extractor and to the segmenter, responsive to the extracted feature vectors, and to the segmentations S, and operative to classify the segment-based feature vectors X of every segmentation in terms of the different sequences of one or more predetermined linguistic units to provide a joint likelihood, P(XY.linevert split.SW), that is a measure of how well each feature vector X of every segmentation fits all the sequences of one or more predetermined linguistic units, and in such a way as to take into account, for every segmentation, the feature vectors Y of all of the other segments of the segment space belonging to other segmentations; and a probabilistic word matcher, coupled to the classifier operable over the entire space of feature vectors, and operative to search the feature vectors classified over the entire space of segment-based feature vectors in terms of the ways the different sequences of one or more linguistic units fit the competing segmentations to find that word string that best matches the acoustic evidence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 19)
-
-
12. A feature-based speech recognizer having a probabilistic linguistic processor providing word matching based on the entire space of landmark-based feature vectors, comprising:
-
a segmenter responsive to acoustic evidence O in form of frames of speech-coded data representative of the speech to be recognized and operative (1) to parse said acoustic evidence into plural segments in such a way that each segment represents another way that said acoustic evidence may be partitioned framewise into segments and landmarks, where all segments together define a segment space that accounts for all of the ways said frames of said acoustic evidence may be partitioned framewise, and (2) to combine said segments into plural segmentations S in such a way that each segmentation represents another way that said segments of said segment space may be combined segmentwise to account for all of said acoustic evidence; a feature extractor, coupled to the segmenter and responsive to the frames of the acoustic evidence, and operative to extract, for each landmark of a possible segmentation, a feature vector Z having predetermined dimensions defined by linguistic units of an acoustic model that is representative of the presence of those linguistic units in the frames of the acoustic evidence underlying each such landmark; a classifier operable over the entire space of feature vectors, coupled to the feature extractor and to the segmenter, responsive to the extracted feature vectors, and to the segmentations S, and operative to classify the landmark-based feature vectors Z of every segmentation in terms of the different sequences of one or more predetermined linguistic units to provide a likelihood, P(Z ISW), that is a measure of how well each feature vector Z of every landmark fits all the sequences of one or more predetermined linguistic units; and a probabilistic word matcher, coupled to the classifier operable over the entire space of feature vectors, and operative to search the feature vectors classified over the entire space of landmark-based feature vectors in terms of the ways the different sequences of one or more linguistic units fit the competing segmentations to find that word string that best matches the acoustic evidence. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
-
20. A method for decoding speech from encoded speech data, comprising the steps of:
-
segmenting said encoded speech data into plural segments of a segment space and into plural segmentations each having at least one constitutive segments; extracting for acoustic analysis from said data a network of feature vectors defined over said plural segmentations; providing at least one model to be used during word matching; and performing word matching for competing word hypotheses in each case over the entire network of feature vectors. - View Dependent Claims (21, 22, 23, 24)
-
-
25. A method for decoding speech from encoded speech data, comprising the steps of:
-
segmenting said encoded speech data into plural segments of a segment space and into plural segmentations each having at least one constitutive segments; providing at least one model to be used during word matching including at least one extra-linguistic model and including an acoustic model representative of predetermined linguistic units; and performing word matching for competing word hypotheses using both said extra-linguistic and said acoustic models.
-
Specification