METHOD AND SYSTEM FOR ROBUST PATTERN MATCHING IN CONTINUOUS SPEECH
First Claim
1. A method for speech recognition in mismatched environments, the method comprising:
- extracting time—
frequency speech features from a series of reference speech elements in a first series of sampling windows;
aligning the extracted time—
frequency speech features in response to reference speech elements from the series of speech elements that are not of equal time span duration;
constructing a common subspace for the aligned extracted time—
frequency speech features;
determining a first set of coefficient vectors for the aligned extracted time—
frequency speech features;
extracting a time—
frequency feature image from a test speech stream spanned by a second sampling window;
approximating the extracted time—
frequency feature image in the common subspace for the aligned extracted time—
frequency speech features with a second coefficient vector;
computing a similarity measure between the first set of coefficient vectors and the second coefficient vector;
determining if the similarity measure is below a predefined threshold; and
wherein a match between the reference speech elements and a portion of the test speech stream spanned by the second sampling window is made in response to a similarity measure below a predefined threshold.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for speech recognition, the method includes: extracting time—frequency speech features from a series of reference speech elements in a first series of sampling windows; aligning reference speech elements that are not of equal time span duration; constructing a common subspace for the aligned speech features; determining a first set of coefficient vectors; extracting a time—frequency feature image from a test speech stream spanned by a second sampling window; approximating the extracted image in the common subspace for the aligned extracted time—frequency speech features with a second coefficient vector; computing a similarity measure between the first and the second coefficient vector; determining if the similarity measure is below a predefined threshold; and wherein a match between the reference speech elements and a portion of the test speech stream is made in response to a similarity measure below a predefined threshold.
21 Citations
6 Claims
-
1. A method for speech recognition in mismatched environments, the method comprising:
-
extracting time—
frequency speech features from a series of reference speech elements in a first series of sampling windows;aligning the extracted time—
frequency speech features in response to reference speech elements from the series of speech elements that are not of equal time span duration;constructing a common subspace for the aligned extracted time—
frequency speech features;determining a first set of coefficient vectors for the aligned extracted time—
frequency speech features;extracting a time—
frequency feature image from a test speech stream spanned by a second sampling window;approximating the extracted time—
frequency feature image in the common subspace for the aligned extracted time—
frequency speech features with a second coefficient vector;computing a similarity measure between the first set of coefficient vectors and the second coefficient vector; determining if the similarity measure is below a predefined threshold; and wherein a match between the reference speech elements and a portion of the test speech stream spanned by the second sampling window is made in response to a similarity measure below a predefined threshold. - View Dependent Claims (2, 3, 4, 5, 6)
-
Specification