×

Testing and tuning of automatic speech recognition systems using synthetic inputs generated from its acoustic models

  • US 7,684,988 B2
  • Filed: 10/15/2004
  • Issued: 03/23/2010
  • Est. Priority Date: 10/15/2004
  • Status: Expired due to Fees
First Claim
Patent Images

1. A speech recognition testing system comprising:

  • a speech recognizer that provides an output based upon a sequence of feature vectors;

    a pronunciation tool that provides a pronunciation of a provided text having at least one word, the pronunciation including a plurality of phonemes, the pronunciation tool comprising a pronunciation store that stores pronunciations for words and a text-to-speech synthesizer that generates phonemes from text, the pronunciation tool first accessing the pronunciation store to obtain the pronunciation for words identified in the text and if the pronunciation store does not include the pronunciation, then using the text-to-speech synthesizer to obtain the pronunciation for the text;

    a model unit generator that generates a model for each of the plurality of phonemes from the provided pronunciation and selects a sequence of Hidden Markov Model states for a Hidden Markov Model (HMM) representative of each of the generated models, the selected sequence of HMM states being a sequence that the speech recognizer is to choose as a best sequence during recognition of speech that is recognized to generate the text, wherein generating a model includes selecting a plurality of candidate HMMs for at least one of the generated models;

    a feature vector data store storing feature vectors;

    a vector generator that generates the sequence of feature vectors to be provided to the speech recognizer from the provided pronunciation of the provided text, wherein at least one of the feature vectors is generated by selecting, for each state in the sequence of HMM states, from the feature vector data store, a feature vector that has a closest probability distribution match with a given mixture in a Markov state in one of the generated models, such that the selected feature vectors produce a best score for the text when the selected feature vectors are provided to the speech recognizer during recognition of the text;

    formatting the selected feature vectors in a format used by the speech recognizer; and

    testing the speech recognizer using the formatted, selected feature vectors.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×