Testing and tuning of automatic speech recognition systems using synthetic inputs generated from its acoustic models

US 7,684,988 B2
Filed: 10/15/2004
Issued: 03/23/2010
Est. Priority Date: 10/15/2004
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognition testing system comprising:

a speech recognizer that provides an output based upon a sequence of feature vectors;

a pronunciation tool that provides a pronunciation of a provided text having at least one word, the pronunciation including a plurality of phonemes, the pronunciation tool comprising a pronunciation store that stores pronunciations for words and a text-to-speech synthesizer that generates phonemes from text, the pronunciation tool first accessing the pronunciation store to obtain the pronunciation for words identified in the text and if the pronunciation store does not include the pronunciation, then using the text-to-speech synthesizer to obtain the pronunciation for the text;

a model unit generator that generates a model for each of the plurality of phonemes from the provided pronunciation and selects a sequence of Hidden Markov Model states for a Hidden Markov Model (HMM) representative of each of the generated models, the selected sequence of HMM states being a sequence that the speech recognizer is to choose as a best sequence during recognition of speech that is recognized to generate the text, wherein generating a model includes selecting a plurality of candidate HMMs for at least one of the generated models;

a feature vector data store storing feature vectors;

a vector generator that generates the sequence of feature vectors to be provided to the speech recognizer from the provided pronunciation of the provided text, wherein at least one of the feature vectors is generated by selecting, for each state in the sequence of HMM states, from the feature vector data store, a feature vector that has a closest probability distribution match with a given mixture in a Markov state in one of the generated models, such that the selected feature vectors produce a best score for the text when the selected feature vectors are provided to the speech recognizer during recognition of the text;

formatting the selected feature vectors in a format used by the speech recognizer; and

testing the speech recognizer using the formatted, selected feature vectors.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method of testing and tuning a speech recognition system by providing pronunciations to the speech recognizer. First a text document is provided to the system and converted into a sequence of phonemes representative of the words in the text. The phonemes are then converted to model units, such as Hidden Markov Models. From the models a probability is obtained for each model or state, and feature vectors are determined. The feature vector matching the most probable vector for each state is selected for each model. These ideal feature vectors are provided to the speech recognizer, and processed. The end result is compared with the original text, and modifications to the system can be made based on the output text.

Citations

11 Claims

1. A speech recognition testing system comprising:
- a speech recognizer that provides an output based upon a sequence of feature vectors;
  
  a pronunciation tool that provides a pronunciation of a provided text having at least one word, the pronunciation including a plurality of phonemes, the pronunciation tool comprising a pronunciation store that stores pronunciations for words and a text-to-speech synthesizer that generates phonemes from text, the pronunciation tool first accessing the pronunciation store to obtain the pronunciation for words identified in the text and if the pronunciation store does not include the pronunciation, then using the text-to-speech synthesizer to obtain the pronunciation for the text;
  
  a model unit generator that generates a model for each of the plurality of phonemes from the provided pronunciation and selects a sequence of Hidden Markov Model states for a Hidden Markov Model (HMM) representative of each of the generated models, the selected sequence of HMM states being a sequence that the speech recognizer is to choose as a best sequence during recognition of speech that is recognized to generate the text, wherein generating a model includes selecting a plurality of candidate HMMs for at least one of the generated models;
  
  a feature vector data store storing feature vectors;
  
  a vector generator that generates the sequence of feature vectors to be provided to the speech recognizer from the provided pronunciation of the provided text, wherein at least one of the feature vectors is generated by selecting, for each state in the sequence of HMM states, from the feature vector data store, a feature vector that has a closest probability distribution match with a given mixture in a Markov state in one of the generated models, such that the selected feature vectors produce a best score for the text when the selected feature vectors are provided to the speech recognizer during recognition of the text;
  
  formatting the selected feature vectors in a format used by the speech recognizer; and
  
  testing the speech recognizer using the formatted, selected feature vectors.
- View Dependent Claims (2, 3, 4)
- - 2. The speech recognition system of claim 1 wherein the pronunciation tool that generates a sequence of phonemes for the pronunciation;
    - andwherein the model unit generator identifies models for each phoneme in the sequence of phonemes.
  - 3. The speech recognition system of claim 2 wherein the model unit generator accesses a database of models in generating models for each of the phonemes in the sequence of phonemes.
  - 4. The speech recognition system of claim 3 wherein the database of feature vectors comprises an acoustic model of the speech recognizer.

5. A method of testing a speech recognition system, comprising:
- receiving a text containing at least one word;
  
  generating a pronunciation for the text with a pronunciation tool, including a plurality of phonemes, by first accessing a pronunciation data store to obtain phonemes indicating pronunciation of the at least one word, and if the pronunciation data store does not contain phonemes for the at least one word, then providing the received text to a text-to-speech synthesizer to obtain the phonemes indicating the pronunciation of the at least one word;
  
  generating a model for each of the phonemes of the pronunciation and selecting a Hidden Markov Model sequence of states for a Hidden Markov Model (HMM) representative of each of the generated models, the selected sequence of HMM states being a sequence that the speech recognizer is to choose as a best sequence during recognition of speech that includes the at least one word, wherein generating a model includes selecting a plurality of candidate HMMs for at least one of the generated models;
  
  generating a sequence of feature vectors for the pronunciation from the model, wherein at least one of the feature vectors is generated by selecting, for each state in the sequence of HMM states, from a feature vector data store, a feature vector that has a closest probability distribution match with a given mixture in a Markov state in one of the generated models, such that the selected feature vectors produce a best score for the text when the selected feature vectors are provided to the speech recognizer during recognition of the at least one word;
  
  providing the sequence of vectors to the speech recognition system; and
  
  outputting text from the speech recognition system, in response to the provided sequence of vectors, for testing evaluation.
- View Dependent Claims (6, 7, 8, 9, 10, 11)
- - 6. The method of claim 5 wherein generating the model further comprises:
    - generating a sequence of model units for the sequence of phonemes.
  - 7. The method of claim 6 wherein generating a sequence of model units for the sequence of phonemes further comprises:
    - accessing a database of models;
      
      identifying a model in the database of models matching one phoneme in the sequence of phonemes; and
      
      returning that model as the model.
  - 8. The method of claim 6 further comprising:
    - obtaining at least one probability for each model unit in the sequence of model units.
  - 9. The method of claim 8 wherein the model obtained is a Hidden Markov Model;
    - andwherein a probability is obtained for each Markov state in the Hidden Markov Model.
  - 10. The method of claim 9 wherein the probability for each Markov state is obtained from a probability distribution for the state.
  - 11. The method of claim 10 wherein the database of feature vectors is an acoustic model of the speech recognition system.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Barquilla, Ricardo Lopez
Primary Examiner(s)
Dorvil; Richemond
Assistant Examiner(s)
He; Jialong

Application Number

US10/965,987
Publication Number

US 20060085187A1
Time in Patent Office

1,985 Days
Field of Search

704/258, 704/266, 704/270, 704/243, 704/251, 704/256, 7042561-2568, 704/246
US Class Current

704/256.1
CPC Class Codes

G10L 13/08 Text analysis or generation...

G10L 15/01 Assessment or evaluation of...

Testing and tuning of automatic speech recognition systems using synthetic inputs generated from its acoustic models

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Testing and tuning of automatic speech recognition systems using synthetic inputs generated from its acoustic models

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links