Sparse representation features for speech recognition

US 8,484,023 B2
Filed: 09/24/2010
Issued: 07/09/2013
Est. Priority Date: 09/24/2010
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

obtaining a test vector and a training data set associated with a speech recognition system;

selecting a subset of the training data set;

mapping the test vector with the selected subset of the training data set as a linear combination that is weighted by a sparseness constraint such that a new test feature set is formed wherein the training data set is moved more closely to the test vector subject to the sparseness constraint; and

training, using a processor, an acoustic model on the new test feature set.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques are disclosed for generating and using sparse representation features to improve speech recognition performance. In particular, principles of the invention provide sparse representation exemplar-based recognition techniques. For example, a method comprises the following steps. A test vector and a training data set associated with a speech recognition system are obtained. A subset of the training data set is selected. The test vector is mapped with the selected subset of the training data set as a linear combination that is weighted by a sparseness constraint such that a new test feature set is formed wherein the training data set is moved more closely to the test vector subject to the sparseness constraint. An acoustic model is trained on the new test feature set. The acoustic model trained on the new test feature set may be used to decode user speech input to the speech recognition system.

Citations

25 Claims

1. A method, comprising:
- obtaining a test vector and a training data set associated with a speech recognition system;
  
  selecting a subset of the training data set;
  
  mapping the test vector with the selected subset of the training data set as a linear combination that is weighted by a sparseness constraint such that a new test feature set is formed wherein the training data set is moved more closely to the test vector subject to the sparseness constraint; and
  
  training, using a processor, an acoustic model on the new test feature set.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, further comprising using the acoustic model trained on the new test feature set to decode user speech input to the speech recognition system.
  - 3. The method of claim 1, wherein the selecting step further comprises selecting the subset of the training data set as the k nearest neighbors to the test vector in the training data set.
  - 4. The method of claim 1, wherein the selecting step further comprises selecting the subset of the training data set based on a trigram language model.
  - 5. The method of claim 1, wherein the selecting step further comprises selecting the subset of the training data set based on a unigram language model.
  - 6. The method of claim 1, wherein the selecting step further comprises selecting the subset of the training data set based on only acoustic information.
  - 7. The method of claim 6, wherein the acoustic information selecting step further comprises using acoustic information with unique phoneme identities.
  - 8. The method of claim 6, wherein the acoustic information comprises a given number of top scoring Gaussian Mixture Models.
  - 9. The method of claim 1, wherein the selecting step further comprises selecting the subset of the training data set based on Gaussian means.
  - 10. The method of claim 1, wherein the selecting step further comprises selecting the subset of the training data set based on random sampling.
  - 11. The method of claim 1, wherein the selecting step further comprises selecting the subset of the training data set based on cosine similarity sampling.
  - 12. The method of claim 1, wherein the mapping step further comprises solving an equation y=Hβ
    - where y is the test vector, H is the selected subset of the training data set, and β
      
      is the sparseness constraint value.
  - 13. The method of claim 12, wherein β
    - is computed using an approximate Bayesian compressive sensing method.

14. An apparatus, comprising:
- a memory; and
  
  a processor operatively coupled to the memory and configured to;
  
  obtain a test vector and a training data set associated with a speech recognition system;
  
  select a subset of the training data set;
  
  map the test vector with the selected subset of the training data set as a linear combination that is weighted by a sparseness constraint such that a new test feature set is formed wherein the training data set is moved more closely to the test vector subject to the sparseness constraint; and
  
  train an acoustic model on the new test feature set.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 15. The apparatus of claim 14, wherein the processor is further configured to use the acoustic model trained on the new test feature set to decode user speech input to the speech recognition system.
  - 16. The apparatus of claim 14, wherein the selecting step further comprises selecting the subset of the training data set as the k nearest neighbors to the test vector in the training data set.
  - 17. The apparatus of claim 14, wherein the selecting step further comprises selecting the subset of the training data set based on a trigram language model.
  - 18. The apparatus of claim 14, wherein the selecting step further comprises selecting the subset of the training data set based on a unigram language model.
  - 19. The apparatus of claim 14, wherein the selecting step further comprises selecting the subset of the training data set based on only acoustic information.
  - 20. The apparatus of claim 14, wherein the selecting step further comprises selecting the subset of the training data set based on Gaussian means.
  - 21. The apparatus of claim 14, wherein the selecting step further comprises selecting the subset of the training data set based on random sampling.
  - 22. The apparatus of claim 14, wherein the selecting step further comprises selecting the subset of the training data set based on cosine similarity sampling.
  - 23. The apparatus of claim 14, wherein the mapping step further comprises solving an equation y=Hβ
    - where y is the test vector, H is the selected subset of the training data set, and β
      
      is the sparseness constraint value.
  - 24. The apparatus of claim 23, wherein β
    - is computed using an approximate Bayesian compressive sensing method.

25. A non-transitory computer readable storage medium having tangibly embodied thereon computer readable program code which, when executed, causes a processor device to:
- obtain a test vector and a training data set associated with a speech recognition system;
  
  select a subset of the training data set;
  
  map the test vector with the selected subset of the training data set as a linear combination that is weighted by a sparseness constraint such that a new test feature set is formed wherein the training data set is moved more closely to the test vector subject to the sparseness constraint; and
  
  train an acoustic model on the new test feature set.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Kanevsky, Dimitri, Nahamoo, David, Ramabhadran, Bhuvana, Sainath, Tara N.
Primary Examiner(s)
Lerner, Martin

Application Number

US12/889,845
Publication Number

US 20120078621A1
Time in Patent Office

1,019 Days
Field of Search

704/236, 704/238, 704/243, 704/256.2, 704/256.3
US Class Current

704/243
CPC Class Codes

G10L 15/02 Feature extraction for spee...

Sparse representation features for speech recognition

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Sparse representation features for speech recognition

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links