Testing and tuning of automatic speech recognition systems using synthetic inputs generated from its acoustic models

US 20060085187A1
Filed: 10/15/2004
Published: 04/20/2006
Est. Priority Date: 10/15/2004
Status: Active Grant

First Claim

Patent Images

1. A speech recognition testing system comprising:

a speech recognizer configured to provide an output text based upon feature vectors;

a pronunciation tool configured to provide a pronunciation for a provided text having at least one word; and

a vector generator configured to generate a sequence of feature vectors from the provided pronunciation for the text.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method of testing and tuning a speech recognition system by providing pronunciations to the speech recognizer. First a text document is provided to the system and converted into a sequence of phonemes representative of the words in the text. The phonemes are then converted to model units, such as Hidden Markov Models. From the models a probability is obtained for each model or state, and feature vectors are determined. The feature vector matching the most probable vector for each state is selected for each model. These ideal feature vectors are provided to the speech recognizer, and processed. The end result is compared with the original text, and modifications to the system can be made based on the output text.

211 Citations

25 Claims

1. A speech recognition testing system comprising:
- a speech recognizer configured to provide an output text based upon feature vectors;
  
  a pronunciation tool configured to provide a pronunciation for a provided text having at least one word; and
  
  a vector generator configured to generate a sequence of feature vectors from the provided pronunciation for the text.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The speech recognition system of claim 1 further comprising:
    - a model unit generator configured to generate models form the provided pronunciation; and
      
      wherein the vector generator generates the sequence of feature vectors based on the generated models.
  - 3. The speech recognition system of claim 2 further comprising:
    - a database of pronunciations configured to provide pronunciations to the pronunciation tool.
  - 4. The speech recognition system of claim 2 further comprising:
    - a text-to-speech synthesizer configured to provide a pronunciation for at least one word in the text to the pronunciation tool.
  - 5. The speech recognition system of claim 2 wherein the pronunciation tool is configured to generate a sequence of phonemes for the pronunciation;
    - and wherein the model unit generator identifies models for each phoneme in the sequence of phonemes.
  - 6. The speech recognition system of claim 5 wherein the model unit generator accesses a database of models in generating models for each of the phonemes in the sequence of phonemes.
  - 7. The speech recognition system of claim 6 wherein the models in the database of models includes Hidden Markov Models.
  - 8. The speech recognition system of claim 1 wherein the vector generator obtains the feature vectors from a database of feature vectors.
  - 9. The speech recognition system of claim 8 wherein the database of feature vectors comprises an acoustic model of the speech recognizer.
  - 10. The speech recognition system of claim 9 wherein the vector generator is configured to identify as the feature vector, a feature vector having a closest match to a distribution probability of the model.

11. A method of testing a speech recognition system, comprising:
- receiving a text containing at least one word;
  
  generating a pronunciation for the text with a pronunciation tool;
  
  generating a sequence of vectors for the pronunciation;
  
  providing the sequence of vectors to the speech recognition system;
  
  outputting text from the speech recognition system in response to the provided sequence of vectors.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
- - 12. The method of claim 11 wherein generating a pronunciation further comprises:
    - generating a sequence of phonemes for the pronunciation.
  - 13. The method of claim 12 further wherein generating a pronunciation comprises:
    - identifying the at least one word in the text in a database of pronunciations; and
      
      retrieving the identified pronunciation.
  - 14. The method of claim 12 wherein generating a pronunciation for the text comprises:
    - providing the text to a module comprising at least a set of letter to sound rules;
      
      generating a sequence of phonemes for the text on the module; and
      
      returning the sequence of phonemes to the pronunciation tool
  - 15. The method of claim 14 wherein the module is a text-to-speech engine.
  - 16. The method of claim 12 wherein generating a model for the sequence of phonemes further comprises:
    - generating a sequence of model units for the sequence of phonemes.
  - 17. The method of claim 16 wherein generating a sequence of model units for the sequence of phonemes further comprises:
    - accessing a database of models;
      
      identifying a model in the database of models matching one phoneme in the sequence of phonemes; and
      
      returning that model as the model.
  - 18. The method of claim 16 further comprising:
    - obtaining at least one probability for each model unit in the sequence of model units
  - 19. The method of claim 18 wherein the model obtained is a Hidden Markov Model;
    - and wherein a probability is obtained for each Markov state in the Hidden Markov Model.
  - 20. The method of claim 19 wherein the probability for each Markov state is a probability distribution for the state.
  - 21. The method of claim 18 wherein generating vectors comprises:
    - identifying feature vectors for each model unit in the sequence of model units; and
      
      for each model unit, selecting as the feature vector the vector matching the model unit having the closest match to a maximum of a probability function.
  - 22. The method of claim 21 wherein generating vectors further comprises:
    - determining a distribution point in the model unit having the highest probability; and
      
      selecting the feature vector having the closest match to the determined distribution point.
  - 23. The method of claim 21 wherein selecting the feature vector comprises:
    - accessing a database of feature vectors.
  - 24. The method of claim 23 wherein the database of feature vectors is an acoustic model of the speech recognition system.
  - 25. The method of claim 11 wherein providing the sequence of vectors, provides the vectors to the speech recognition system directly to a component that follows a component that determines the feature vectors for the speech recognition system.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Barquilla, Ricardo Lopez

Granted Patent

US 7,684,988 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/243
CPC Class Codes

G10L 13/08 Text analysis or generation...

G10L 15/01 Assessment or evaluation of...

Testing and tuning of automatic speech recognition systems using synthetic inputs generated from its acoustic models

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

211 Citations

25 Claims

Specification

Use Cases

Quick Links

Others

Testing and tuning of automatic speech recognition systems using synthetic inputs generated from its acoustic models

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

211 Citations

25 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others