Grammar confusability metric for speech recognition

US 7,844,456 B2
Filed: 03/09/2007
Issued: 11/30/2010
Est. Priority Date: 03/09/2007
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented system that facilitates speech recognition, comprising:

a vector component for generating feature vectors that approximate acoustical properties of an input term;

a metric component for recognition processing of the feature vectors based on multiple iterations and generating multiple iteration confusability metrics respectively for each of the multiple iterations; and

an aggregation component for aggregating the multiple iteration confusability metrics and generating an overall confusability metric based on the multiple iterations of recognition processing of the feature vectors.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Architecture for testing an application grammar for the presence of confusable terms. A grammar confusability metric (GCM) is generated for describing a likelihood that a reference term will be confused by the speech recognizer with another term phrase currently allowed by active grammar rules. The GCM is used to flag processing of two phrases in the grammar that have different semantic meaning, but that the speech recognizer could have difficulty distinguishing reliably. A built-in acoustic model is analyzed and feature vectors generated that are close to the acoustic properties of the input term. The feature vectors are then sent for recognition. A statistically random sampling method is applied to explore the acoustic properties of feature vectors of the input term phrase spatially and temporally. The feature vectors are perturbed in the neighborhood of the time domain and the Gaussian mixture model to which the feature vectors belong.

32 Citations

View as Search Results

19 Claims

1. A computer-implemented system that facilitates speech recognition, comprising:
- a vector component for generating feature vectors that approximate acoustical properties of an input term;
  
  a metric component for recognition processing of the feature vectors based on multiple iterations and generating multiple iteration confusability metrics respectively for each of the multiple iterations; and
  
  an aggregation component for aggregating the multiple iteration confusability metrics and generating an overall confusability metric based on the multiple iterations of recognition processing of the feature vectors.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The system of claim 1, wherein the aggregation component is part of the metric component.
  - 3. The system of claim 1, wherein the metric component employs a Gaussian mixture model and hidden Markov model for processing of distributions associated with the feature vectors.
  - 4. The system of claim 1, wherein the feature vectors include a senone that is perturbed according to a Gaussian mixture model.
  - 5. The system of claim 1, wherein the feature vectors are perturbed in a time domain for variation of time duration of the input phrase.
  - 6. The system of claim 1, wherein the feature vectors are perturbed in a spatial domain to find neighboring phonemes.
  - 7. The system of claim 1, wherein the term is from an application grammar that is being tested for confusability of grammar terms.
  - 8. The system of claim 1, further comprising a simulation component for initiating simulation processing of the feature vectors based on spatial and temporal domain perturbation.
  - 9. The system of claim 1, further comprising an application interface for triggering an end simulation event and notifying an application that the overall confusability metric can be retrieved.

10. A computer-implemented method of performing speech recognition employing a computer programmed to perform the method, comprising:
- converting an input term into a set of senone IDs;
  
  randomly selecting feature vectors that are representative of distributions of the set of senone IDs;
  
  driving a recognition process using the feature vectors to output a result;
  
  perturbing the feature vectors in at least one of spatially or temporally for neighboring samples; and
  
  aggregating results from multiple iterations of the input term into an overall confusability metric.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The method of claim 10, further comprising increasing a number of the iterations based on an increase in potential confusability of the input text.
  - 12. The method of claim 10, further comprising processing a greater number of perturbations based on an increase in potential confusability of the input text.
  - 13. The method of claim 10, further comprising running a first process for homophones and a second process based on the first process being non-homophonic.
  - 14. The method of claim 10, further comprising iteratively processing homophonic terms against a variable set of different pronunciations and non-homophonic terms against a fixed set of iterations.
  - 15. The method of claim 10, further comprising perturbing the feature vectors both spatially and temporally for neighboring samples.
  - 16. The method of claim 10, further comprising selecting the input term from an application grammar and editing the grammar based on the overall confusability metric.
  - 17. The method of claim 10, further comprising controlling the recognition process into a simulation mode for simulation processing of feature vectors associated with a potentially confusing input term.
  - 18. The method of claim 10, further comprising walking a tree of candidate senone IDs and perturbing the associated distributions at a mean and according to a variance about the mean.

19. A computer-implemented system, comprising:
- computer-implemented means for converting an input term into a set of senone IDs;
  
  computer-implemented means for randomly selecting feature vectors that are representative of distributions of the set of senone IDs;
  
  computer-implemented means for driving a recognition process using the feature vectors to output a result;
  
  computer-implemented means for perturbing the feature vectors in at least one of spatially or temporally for neighboring samples; and
  
  computer-implemented means for aggregating results from multiple iterations of the input term into an overall confusability metric.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Cai, Qin, Hamaker, John
Primary Examiner(s)
ALBERTALLI, BRIAN LOUIS

Application Number

US11/716,210
Publication Number

US 20080221896A1
Time in Patent Office

1,362 Days
Field of Search

None
US Class Current

704/243
CPC Class Codes

G10L 15/01 Assessment or evaluation of...

G10L 15/19 Grammatical context, e.g. d...

Grammar confusability metric for speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

32 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Grammar confusability metric for speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

32 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links