METHOD AND SYSTEM FOR ROBUST PATTERN MATCHING IN CONTINUOUS SPEECH

US 20090276216A1
Filed: 05/02/2008
Published: 11/05/2009
Est. Priority Date: 05/02/2008
Status: Active Grant

First Claim

Patent Images

1. A method for speech recognition in mismatched environments, the method comprising:

extracting time—

frequency speech features from a series of reference speech elements in a first series of sampling windows;

aligning the extracted time—

frequency speech features in response to reference speech elements from the series of speech elements that are not of equal time span duration;

constructing a common subspace for the aligned extracted time—

frequency speech features;

determining a first set of coefficient vectors for the aligned extracted time—

frequency speech features;

extracting a time—

frequency feature image from a test speech stream spanned by a second sampling window;

approximating the extracted time—

frequency feature image in the common subspace for the aligned extracted time—

frequency speech features with a second coefficient vector;

computing a similarity measure between the first set of coefficient vectors and the second coefficient vector;

determining if the similarity measure is below a predefined threshold; and

wherein a match between the reference speech elements and a portion of the test speech stream spanned by the second sampling window is made in response to a similarity measure below a predefined threshold.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for speech recognition, the method includes: extracting time—frequency speech features from a series of reference speech elements in a first series of sampling windows; aligning reference speech elements that are not of equal time span duration; constructing a common subspace for the aligned speech features; determining a first set of coefficient vectors; extracting a time—frequency feature image from a test speech stream spanned by a second sampling window; approximating the extracted image in the common subspace for the aligned extracted time—frequency speech features with a second coefficient vector; computing a similarity measure between the first and the second coefficient vector; determining if the similarity measure is below a predefined threshold; and wherein a match between the reference speech elements and a portion of the test speech stream is made in response to a similarity measure below a predefined threshold.

21 Citations

View as Search Results

6 Claims

1. A method for speech recognition in mismatched environments, the method comprising:
- extracting time—
  
  frequency speech features from a series of reference speech elements in a first series of sampling windows;
  
  aligning the extracted time—
  
  frequency speech features in response to reference speech elements from the series of speech elements that are not of equal time span duration;
  
  constructing a common subspace for the aligned extracted time—
  
  frequency speech features;
  
  determining a first set of coefficient vectors for the aligned extracted time—
  
  frequency speech features;
  
  extracting a time—
  
  frequency feature image from a test speech stream spanned by a second sampling window;
  
  approximating the extracted time—
  
  frequency feature image in the common subspace for the aligned extracted time—
  
  frequency speech features with a second coefficient vector;
  
  computing a similarity measure between the first set of coefficient vectors and the second coefficient vector;
  
  determining if the similarity measure is below a predefined threshold; and
  
  wherein a match between the reference speech elements and a portion of the test speech stream spanned by the second sampling window is made in response to a similarity measure below a predefined threshold.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the method further comprises:
    - incrementing the second sampling window by a unit of time and extracting a new time—
      
      frequency feature image from the test speech stream spanned by the incremented second sampling window;
      
      approximating the extracted new time—
      
      frequency feature image in the common subspace for the aligned extracted time—
      
      frequency speech features with a new coefficient vector;
      
      computing a similarity measure between the first set of coefficient vectors and the new coefficient vector;
      
      determining if the similarity measure is below a predefined threshold;
      
      wherein a match between the reference speech elements and a portion of the test speech stream spanned by the incremented second sampling window is made in response to a similarity measure below a predefined threshold; and
      
      wherein the process of incrementing the second sampling window and computing additional similarity measures continues until the test speech stream is exhausted.
  - 3. The method of claim 1, wherein extracting time—
    - frequency speech features from the series of reference speech elements is obtained from a feature domain comprising at least one of;
      
      Perceptual linear Predictive (PLP) modified power spectrum, or Mel-Frequency Cepstral Components (MFCC).
  - 4. The method of claim 1, wherein the aligning the extracted time—
    - frequency speech features employs Dynamic Time Warping (DTW).
  - 5. The method of claim 1, wherein Simultaneous Orthogonal Matching Pursuit (SOMP) is used for constructing the common subspace for the aligned extracted time—
    - frequency speech features.
  - 6. The method of claim 1, wherein the first set of coefficient vectors are obtained by solving least squares equations.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Kokiopoulou, Effrosyni, Verscheure, Oliver, Frossard, Pascal, Amini, Lisa

Granted Patent

US 9,293,130 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/236
CPC Class Codes

G10L 15/02 Feature extraction for spee...

G10L 15/22 Procedures used during a sp...

METHOD AND SYSTEM FOR ROBUST PATTERN MATCHING IN CONTINUOUS SPEECH

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

21 Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD AND SYSTEM FOR ROBUST PATTERN MATCHING IN CONTINUOUS SPEECH

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

21 Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links