Speech recognition by selecting and refining hot words

US 10,607,601 B2
Filed: 05/11/2017
Issued: 03/31/2020
Est. Priority Date: 05/11/2017
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for performing speech recognition, the method comprising:

generating, by a computer, an acoustic similarity matrix using a set of Gaussian Mixture Models (GMMs) and a signal classifier, wherein the acoustic similarity matrix includes similarity values between a first set of phones and a second set of phones;

receiving, by the computer, a speech signal including one or more spoken phones;

applying, by the computer, a dynamic time warping procedure to the received speech signal to generate a time-warped signal, wherein the time-warped signal is among a test pattern indicative of a locus of a set of characterization vectors obtained from the speech signal;

comparing, by the computer, the time-warped signal to a plurality of stored reference patterns to determine a set of similarity values among the acoustic similarity matrix, the set of similarity values corresponding to the plurality of stored reference patterns, wherein each similarity value indicates a similarity level between the time-warped signal and each reference pattern, and an increase of the similarity value is indicative of an increase of dissimilarity between the time-warped signal and a reference pattern in the comparison;

identifying, by the computer, a reference pattern among of the plurality of stored reference patterns that has a smallest similarity value;

selecting, by the computer, a candidate hot word from a list of candidate hot words that corresponds to the identified reference pattern;

determining, by the computer, another hot word having a greater probability of occurrence than the candidate hot word; and

refining, by the computer, the selection of the candidate hot word based on the said determining.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Speech recognition is performed by receiving a speech signal that includes spoken phones. A dynamic time warping procedure is applied to the received speech signal to generate a time-warped signal. The time-warped signal is compared to a plurality of stored reference patterns to identify a set of stored reference patterns that are most similar to the time-warped signal. A candidate hot word is selected from a list using the identified set of stored reference patterns. The selection of the candidate hot word is then refined.

50 Citations

18 Claims

1. A computer-implemented method for performing speech recognition, the method comprising:
- generating, by a computer, an acoustic similarity matrix using a set of Gaussian Mixture Models (GMMs) and a signal classifier, wherein the acoustic similarity matrix includes similarity values between a first set of phones and a second set of phones;
  
  receiving, by the computer, a speech signal including one or more spoken phones;
  
  applying, by the computer, a dynamic time warping procedure to the received speech signal to generate a time-warped signal, wherein the time-warped signal is among a test pattern indicative of a locus of a set of characterization vectors obtained from the speech signal;
  
  comparing, by the computer, the time-warped signal to a plurality of stored reference patterns to determine a set of similarity values among the acoustic similarity matrix, the set of similarity values corresponding to the plurality of stored reference patterns, wherein each similarity value indicates a similarity level between the time-warped signal and each reference pattern, and an increase of the similarity value is indicative of an increase of dissimilarity between the time-warped signal and a reference pattern in the comparison;
  
  identifying, by the computer, a reference pattern among of the plurality of stored reference patterns that has a smallest similarity value;
  
  selecting, by the computer, a candidate hot word from a list of candidate hot words that corresponds to the identified reference pattern;
  
  determining, by the computer, another hot word having a greater probability of occurrence than the candidate hot word; and
  
  refining, by the computer, the selection of the candidate hot word based on the said determining.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The computer-implemented method of claim 1 wherein the refining, by the computer, of the selection of the candidate hot word is performed using at least one of:
    - a set of n-gram information, a semantic analysis of a named entity, or a set of Gaussian components.
  - 3. The computer-implemented method of claim 2 wherein the refining, by the computer, of the selection of the candidate hot word is performed by evaluating a probability of occurrence of the identified candidate hot word.
  - 4. The computer-implemented method of claim 3 wherein the refining, by the computer, of the selection of the candidate hot word includes changing the identified candidate hot word to the another candidate hot word having the greater probability of occurrence than the identified candidate hot word.
  - 5. The computer-implemented method of claim 1 wherein the selecting, by the computer, of a candidate hot word from a list of candidate hot words further comprises generating the list of candidate hot words using the acoustic similarity matrix.
  - 6. The computer-implemented method of claim 1 wherein the selecting, by the computer, of a candidate hot word from a list of candidate hot words further comprises generating the list of candidate hot words from an electronically searchable dictionary.
  - 7. The computer-implemented method of claim 1, wherein the method is provided as a service in a cloud environment.
  - 8. The method of claim 1, wherein comparing the time-warped signal to the plurality of stored references patterns comprises detecting a locus of a warping function to compare a reference pattern with the time-warped signal.

9. A computer program product for performing speech recognition, the computer program product comprising a computer-readable storage medium having a computer-readable program stored therein, wherein the computer-readable program, when executed on a computing device including at least one processor, causes the at least one processor to:
- generate an acoustic similarity matrix using a set of Gaussian Mixture Models (GMMs) and a signal classifier, wherein the acoustic similarity matrix includes similarity values between a first set of phones and a second set of phones;
  
  receive a speech signal including one or more spoken phones;
  
  apply a dynamic time warping procedure to the received speech signal to generate a time-warped signal, wherein the time-warped signal is among a test pattern indicative of a locus of a set of characterization vectors obtained from the speech signal;
  
  compare the time-warped signal to a plurality of stored reference patterns to determine a set of similarity values among the acoustic similarity matrix, the set of similarity values corresponding to the plurality of stored reference patterns, wherein each similarity value indicates a similarity level between the time-warped signal and each reference pattern, and an increase of the similarity value is indicative of an increase of dissimilarity between the time-warped signal and a reference pattern in the comparison;
  
  identify a reference pattern among of the plurality of stored reference patterns that has a smallest similarity value;
  
  select a candidate hot word from a list of candidate hot words that corresponds to the identified reference pattern;
  
  determine another hot word having a greater probability of occurrence than the candidate hot word; and
  
  refine the selection of the candidate hot word based on the said determination.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The computer program product of claim 9 wherein the refining is performed using at least one of:
    - a set of n-gram information, a semantic analysis of a named entity, or a set of Gaussian components.
  - 11. The computer program product of claim 10 wherein the refining is performed by evaluating a probability of occurrence of the identified candidate hot word.
  - 12. The computer program product of claim 11 wherein the computer-readable program, when executed on a computing device including at least one processor, causes the at least one processor to refine the selection by changing the identified candidate hot word to the another candidate hot word having the greater probability of occurrence than the identified candidate hot word.
  - 13. The computer program product of claim 9 wherein the selecting further comprises generating the list of candidate hot words using the acoustic similarity matrix.
  - 14. The computer program product of claim 9 wherein the selecting further comprises generating the list of candidate hot words from an electronically searchable dictionary.

15. An apparatus for performing speech recognition, the apparatus comprising:
- at least one processor; and
  
  a memory coupled to the at least one processor, wherein the memory comprises program instructions which, when executed by the at least one processor, cause the at least one processor to;
  
  generating, by a computer, an acoustic similarity matrix using a set of Gaussian Mixture Models (GMMs) and a signal classifier, wherein the acoustic similarity matrix includes similarity values between a first set of phones and a second set of phones;
  
  receive a speech signal including one or more spoken phones;
  
  apply a dynamic time warping procedure to the received speech signal to generate a time-warped signal, wherein the time-warped signal is among a test pattern indicative of a locus of a set of characterization vectors obtained from the speech signal;
  
  compare the time-warped signal to a plurality of stored reference patterns to determine a set of similarity values among the acoustic similarity matrix, the set of similarity values corresponding to the plurality of stored reference patterns, wherein each similarity value indicates a similarity level between the time-warped signal and each reference pattern, and an increase of the similarity value is indicative of an increase of dissimilarity between the time-warped signal and a reference pattern in the comparison;
  
  identify a reference pattern among of the plurality of stored reference patterns that has a smallest similarity value;
  
  select a candidate hot word from a list of candidate hot words that corresponds to the identified reference pattern;
  
  determine another hot word having a greater probability of occurrence than the candidate hot word; and
  
  refine the selection of the candidate hot word based on the said determination.
- View Dependent Claims (16, 17, 18)
- - 16. The apparatus of claim 15 further configured for performing the refining using at least one of:
    - a set of n-gram information, a semantic analysis of a named entity, or a set of Gaussian components.
  - 17. The apparatus of claim 16 further configured for performing the refining by evaluating a probability of occurrence of the identified candidate hot word.
  - 18. The apparatus of claim 17 further configured for performing the refining by changing the identified candidate hot word to the another candidate hot word having the greater probability of occurrence than the identified candidate hot word.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Jin, Feng, Liu, Wen, Ma, Li Jun, Zhu, Peng Cheng P P, Qin, Yong, Shi, Qin, Zhang, Shi Lei
Primary Examiner(s)
Wozniak, James S

Application Number

US15/592,773
Publication Number

US 20180330717A1
Time in Patent Office

1,055 Days
Field of Search

704231, 704241, 704251, 704257
US Class Current
CPC Class Codes

G10L 15/12   using dynamic programming t...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/197   Probabilistic grammars, e.g...

G10L 2015/088   Word spotting

Speech recognition by selecting and refining hot words

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

50 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition by selecting and refining hot words

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

50 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links