Speech recognition by selecting and refining hot words
First Claim
Patent Images
1. A computer-implemented method for performing speech recognition, the method comprising:
- generating, by a computer, an acoustic similarity matrix using a set of Gaussian Mixture Models (GMMs) and a signal classifier, wherein the acoustic similarity matrix includes similarity values between a first set of phones and a second set of phones;
receiving, by the computer, a speech signal including one or more spoken phones;
applying, by the computer, a dynamic time warping procedure to the received speech signal to generate a time-warped signal, wherein the time-warped signal is among a test pattern indicative of a locus of a set of characterization vectors obtained from the speech signal;
comparing, by the computer, the time-warped signal to a plurality of stored reference patterns to determine a set of similarity values among the acoustic similarity matrix, the set of similarity values corresponding to the plurality of stored reference patterns, wherein each similarity value indicates a similarity level between the time-warped signal and each reference pattern, and an increase of the similarity value is indicative of an increase of dissimilarity between the time-warped signal and a reference pattern in the comparison;
identifying, by the computer, a reference pattern among of the plurality of stored reference patterns that has a smallest similarity value;
selecting, by the computer, a candidate hot word from a list of candidate hot words that corresponds to the identified reference pattern;
determining, by the computer, another hot word having a greater probability of occurrence than the candidate hot word; and
refining, by the computer, the selection of the candidate hot word based on the said determining.
1 Assignment
0 Petitions
Accused Products
Abstract
Speech recognition is performed by receiving a speech signal that includes spoken phones. A dynamic time warping procedure is applied to the received speech signal to generate a time-warped signal. The time-warped signal is compared to a plurality of stored reference patterns to identify a set of stored reference patterns that are most similar to the time-warped signal. A candidate hot word is selected from a list using the identified set of stored reference patterns. The selection of the candidate hot word is then refined.
50 Citations
18 Claims
-
1. A computer-implemented method for performing speech recognition, the method comprising:
-
generating, by a computer, an acoustic similarity matrix using a set of Gaussian Mixture Models (GMMs) and a signal classifier, wherein the acoustic similarity matrix includes similarity values between a first set of phones and a second set of phones; receiving, by the computer, a speech signal including one or more spoken phones; applying, by the computer, a dynamic time warping procedure to the received speech signal to generate a time-warped signal, wherein the time-warped signal is among a test pattern indicative of a locus of a set of characterization vectors obtained from the speech signal; comparing, by the computer, the time-warped signal to a plurality of stored reference patterns to determine a set of similarity values among the acoustic similarity matrix, the set of similarity values corresponding to the plurality of stored reference patterns, wherein each similarity value indicates a similarity level between the time-warped signal and each reference pattern, and an increase of the similarity value is indicative of an increase of dissimilarity between the time-warped signal and a reference pattern in the comparison; identifying, by the computer, a reference pattern among of the plurality of stored reference patterns that has a smallest similarity value; selecting, by the computer, a candidate hot word from a list of candidate hot words that corresponds to the identified reference pattern; determining, by the computer, another hot word having a greater probability of occurrence than the candidate hot word; and refining, by the computer, the selection of the candidate hot word based on the said determining. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer program product for performing speech recognition, the computer program product comprising a computer-readable storage medium having a computer-readable program stored therein, wherein the computer-readable program, when executed on a computing device including at least one processor, causes the at least one processor to:
-
generate an acoustic similarity matrix using a set of Gaussian Mixture Models (GMMs) and a signal classifier, wherein the acoustic similarity matrix includes similarity values between a first set of phones and a second set of phones; receive a speech signal including one or more spoken phones; apply a dynamic time warping procedure to the received speech signal to generate a time-warped signal, wherein the time-warped signal is among a test pattern indicative of a locus of a set of characterization vectors obtained from the speech signal; compare the time-warped signal to a plurality of stored reference patterns to determine a set of similarity values among the acoustic similarity matrix, the set of similarity values corresponding to the plurality of stored reference patterns, wherein each similarity value indicates a similarity level between the time-warped signal and each reference pattern, and an increase of the similarity value is indicative of an increase of dissimilarity between the time-warped signal and a reference pattern in the comparison; identify a reference pattern among of the plurality of stored reference patterns that has a smallest similarity value; select a candidate hot word from a list of candidate hot words that corresponds to the identified reference pattern; determine another hot word having a greater probability of occurrence than the candidate hot word; and refine the selection of the candidate hot word based on the said determination. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. An apparatus for performing speech recognition, the apparatus comprising:
-
at least one processor; and a memory coupled to the at least one processor, wherein the memory comprises program instructions which, when executed by the at least one processor, cause the at least one processor to; generating, by a computer, an acoustic similarity matrix using a set of Gaussian Mixture Models (GMMs) and a signal classifier, wherein the acoustic similarity matrix includes similarity values between a first set of phones and a second set of phones; receive a speech signal including one or more spoken phones; apply a dynamic time warping procedure to the received speech signal to generate a time-warped signal, wherein the time-warped signal is among a test pattern indicative of a locus of a set of characterization vectors obtained from the speech signal; compare the time-warped signal to a plurality of stored reference patterns to determine a set of similarity values among the acoustic similarity matrix, the set of similarity values corresponding to the plurality of stored reference patterns, wherein each similarity value indicates a similarity level between the time-warped signal and each reference pattern, and an increase of the similarity value is indicative of an increase of dissimilarity between the time-warped signal and a reference pattern in the comparison; identify a reference pattern among of the plurality of stored reference patterns that has a smallest similarity value; select a candidate hot word from a list of candidate hot words that corresponds to the identified reference pattern; determine another hot word having a greater probability of occurrence than the candidate hot word; and refine the selection of the candidate hot word based on the said determination. - View Dependent Claims (16, 17, 18)
-
Specification