Speech recognition using discriminant features

US 7,337,114 B2
Filed: 03/29/2001
Issued: 02/26/2008
Est. Priority Date: 03/29/2001
Status: Expired due to Term

First Claim

Patent Images

1. A method of facilitating speech recognition, said method comprising the steps of:

obtaining speech input data;

building a model for each feature of an original set of linguistic features, wherein the model reflects whether or not each feature is present;

ranking the linguistic features;

rebuilding the model for each of a preselected number N of the ranked linguistic features; and

compiling a confusion matrix for each feature of the original set of features subsequent to said step of building a model for each feature of an original set of features, wherein said compiling a confusion matrix comprises;

computing a score for each feature based on the likelihood of its presence in a frame of the speech input data, andcalculating mutual information between truth and labels for each feature;

wherein the ranking comprises ranking the mutual information calculated in compiling the confusion matrix.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and arrangements for representing the speech waveform in terms of a set of abstract, linguistic distinctions in order to derive a set of discriminative features for use in a speech recognizer. By combining the distinctive feature representation with an original waveform representation, it is possible to achieve a reduction in word error rate of 33% on an automatic speech recognition task.

14 Citations

View as Search Results

17 Claims

1. A method of facilitating speech recognition, said method comprising the steps of:
- obtaining speech input data;
  
  building a model for each feature of an original set of linguistic features, wherein the model reflects whether or not each feature is present;
  
  ranking the linguistic features;
  
  rebuilding the model for each of a preselected number N of the ranked linguistic features; and
  
  compiling a confusion matrix for each feature of the original set of features subsequent to said step of building a model for each feature of an original set of features, wherein said compiling a confusion matrix comprises;
  
  computing a score for each feature based on the likelihood of its presence in a frame of the speech input data, andcalculating mutual information between truth and labels for each feature;
  
  wherein the ranking comprises ranking the mutual information calculated in compiling the confusion matrix.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method according to claim 1, wherein said step of building a model for each of a preselected number N of the ranked features comprises building a model for the top N ranked features.
  - 3. The method according to claim 1, wherein said step of computing a score for each feature comprises computing a score as a log-likelihood ratio.
  - 4. The method according to claim 1, wherein said step of compiling a confusion matrix further comprises comparing each score of each feature with a threshold.
  - 5. The method according to claim 1, wherein said step of building a model for each feature of an original set of features comprises:
    - partitioning the speech input data in parallel, once for ezich feature; and
      
      producing an observation vector.
  - 6. The method according to claim 5, wherein said step of building a model for each feature of an original set of features comprises:
    - partitioning data in parallel from the observation vector, once for each feature; and
      
      producing final observations.
  - 7. The method according to claim 1, wherein said step of building a model for each of a preselected number N of the ranked features comprises:
    - partitioning the speech input data in parallel, once for each feature; and
      
      producing an observation vector.
  - 8. The method according to claim 7, wherein said step of building a model for each of a preselected number N of the ranked features comprises:
    - partitioning data in parallel from the observation vector, once for each feature; and
      
      producing final observations.

9. An apparatus for facilitating speech recognition, said method tomprising the steps of:
- an input medium which obtains speech input data;
  
  a first model builder which builds a model for each feature of an orininal set of linguistic features, wherein the model reflects whether or not each feature is present;
  
  a ranking arrangement which ranks the linguistic features;
  
  a second model builder which rebuilds the model for each of a preselected number N of the ranked linguistic features; and
  
  a matrix compiler which compiles a confusion matrix for each feature of the original set of features subsequent to said step of building a model for each feature of an original set of features, wherein said matrix compiler is adapted to;
  
  compute a score for each feature based on the likelihood of its presence in a frame of the speech input data, andcalculate mutual information between truth and labels for each feature;
  
  wherein said ranking arrangement is adapted to rank the mutual information calculated in compiling the confusion matrix.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The apparatus according to claim 9, wherein said second model builder is adapted to build a model for the top N ranked features.
  - 11. The apparatus according to claim 9, wherein said matrix compiler is adapted to compute a score as a log-likelihood ratio.
  - 12. The apparatus according to claim 9, wherein said matrix compiler is adapted to compare each score of each feature with a threshold.
  - 13. The apparatus according to claim 9, wherein said first model builder is adapted to:
    - partition the speech input data in parallel, once for each feature; and
      
      produce an observation vector.
  - 14. The apparatus according to claim 13, wherein said first model builder is adapted to:
    - partition data in parallel from the observation vector, once for each feature; and
      
      produce final observations.
  - 15. The apparatus according to claim 9, wherein said second model builder is adapted to:
    - partition the speech input data in parallel, once for each feature; and
      
      produce an observation vector.
  - 16. The apparatus according to claim 15, wherein said second model builder is adapted to:
    - partition data in parallel from the observation vector, once for each feature; and
      
      produce final observations.

17. A program storage device readable by computer, tangibly embodying a program of instructions executable by the computer to perform method steps for speech recognition, said method comprising the steps of:
- obtaining speech input data;
  
  building a model for each feature of an original set of linguistic features, wherein the model reflects whether or not each feature is present;
  
  ranking the linguistic features;
  
  rebuilding the model for each of a preselected number N of the ranked linguistic features; and
  
  compiling a confusion matrix for each feature of the original set of features subsequent to said step of building a model for each feature of an original set of features, wherein said compiling a confusion matrix comprises;
  
  computing a score for each feature based on the likelihood of its presence in a frame of the speech input data, andcalculating mutual information between truth and labels for each feature;
  
  wherein the ranking comprises ranking the mutual information calculated in compiling the confusion matrix.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Eide, Ellen M.
Primary Examiner(s)
Edouard; Patrick N.
Assistant Examiner(s)
Wozniak; James S.

Application Number

US09/821,404
Publication Number

US 20030023436A1
Time in Patent Office

2,525 Days
Field of Search

704254-257, 704/240, 704/250, 704/251, 704242-245, 704/232, 704/236
US Class Current

704/243
CPC Class Codes

G10L 15/02 Feature extraction for spee...

Speech recognition using discriminant features

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

14 Citations

17 Claims

Specification

Use Cases

Quick Links

Others

Speech recognition using discriminant features

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

14 Citations

17 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others