System and method for tuning and testing in a speech recognition system

US 7,440,895 B1
Filed: 12/01/2003
Issued: 10/21/2008
Est. Priority Date: 12/01/2003
Status: Active Grant

First Claim

Patent Images

1. A method of testing a speech recognizer, the method comprising:

receiving a plurality of digital audio data files, each audio file comprising audio recorded in response to a first prompt by a speech recognition application;

receiving a grammar associated with the first prompt, the grammar comprising a plurality of concepts, each concept having a set of phrases organized under a single idea, the idea representing an expected response to the first prompt;

producing a first recognition result for each audio data file based at least in part on the grammar using the speech recognizer;

receiving a user-defined transcript of each audio file, and scoring the first recognition results for each audio data file based at least in part on the transcript of each audio file;

modifying the grammar based on the scoring of the first recognition result of each audio data file;

producing a second recognition result for each audio data file based on the modified grammar using the speech recognizer;

using the user-defined transcript of each audio data file to score the second recognition result for each audio data file;

comparing the scoring of the first recognition result with the second result for each audio data file; and

outputting the first or second recognition result for each audio data file based on the comparison.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for improving the performance of a speech recognition system. In some embodiments a tuner module and/or a tester module are configured to cooperate with a speech recognition system. The tester and tuner modules can be configured to cooperate with each other. In one embodiment, the tuner module may include a module for playing back a selected portion of a digital data audio file, a module for creating and/or editing a transcript of the selected portion, and/or a module for displaying information associated with a decoding of the selected portion, the decoding generated by a speech recognition engine. In other embodiments, the tester module can include an editor for creating and/or modifying a grammar, a module for receiving a selected portion of a digital audio file and its corresponding transcript, and a scoring module for producing scoring statistics of the decoding based at least in part on the transcript.

78 Citations

View as Search Results

44 Claims

1. A method of testing a speech recognizer, the method comprising:
- receiving a plurality of digital audio data files, each audio file comprising audio recorded in response to a first prompt by a speech recognition application;
  
  receiving a grammar associated with the first prompt, the grammar comprising a plurality of concepts, each concept having a set of phrases organized under a single idea, the idea representing an expected response to the first prompt;
  
  producing a first recognition result for each audio data file based at least in part on the grammar using the speech recognizer;
  
  receiving a user-defined transcript of each audio file, and scoring the first recognition results for each audio data file based at least in part on the transcript of each audio file;
  
  modifying the grammar based on the scoring of the first recognition result of each audio data file;
  
  producing a second recognition result for each audio data file based on the modified grammar using the speech recognizer;
  
  using the user-defined transcript of each audio data file to score the second recognition result for each audio data file;
  
  comparing the scoring of the first recognition result with the second result for each audio data file; and
  
  outputting the first or second recognition result for each audio data file based on the comparison.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, wherein the first decode result comprises concepts, phrases, words, and/or phonemes.
  - 3. The method of claim 1, wherein the first decode result comprises a confidence score.
  - 4. The method of claim 1, further comprising displaying a result of the scoring.
  - 5. The method of claim 4, further comprising displaying a result of the scoring on a user interface.
  - 6. The method of claim 5, wherein the user interface is a graphical user interface.
  - 7. The method of claim 1, further comprising creating and/or modifying a response file associated with the audio data file.
  - 8. The method of claim 1, wherein the response file comprises the audio file, a portion of the grammar associated with the audio file, the decode result, and/or the transcript.
  - 9. The method of claim 1, further comprising transmitting the first decode results to a tuner module for processing.
  - 10. The method of claim 1, further comprising:
    - modifying the grammar;
      
      producing a second decode result of each digital audio file based at least in part on the modified grammar; and
      
      scoring the second decode results based at least in part on the transcript of each audio file.
  - 11. The method of claim 10, further comprising comparing the scoring of the first decode results and the scoring of the second decode results.
  - 12. The method of claim 1, wherein each of the set of phrases comprises a word, a word block, a BNF construct, or a phoneme block.
  - 13. The method of claim 1, further comprising:
    - receiving a second plurality of digital audio data files, each audio file comprising audio recorded in response to a second prompt by the speech recognition application;
      
      receiving a second grammar associated with the second prompt, wherein the second grammar comprises a plurality of concepts, each concept having a set of phrases organized under a single idea, the idea representing an expected response to the second prompt;
      
      producing a second decode result for each audio file in the second plurality of digital audio data files based at least in part on the second grammar;
      
      receiving a transcript of each audio file in the second plurality of audio data files; and
      
      scoring the second decode results based at least in part on the transcripts of each of the second plurality of digital audio files.
  - 14. The method of claim 1, wherein scoring the decode results comprises generating statistics on the accuracy of the decode results with respect to each transcript, the statistics comprising word error rate, concept error rate, and average confidence scores for correct and incorrect results.

15. A system for testing a speech recognizer, the system comprising:
- an audio recorder module for receiving a plurality of digital audio data files, each data file comprising audio recorded in response to a first prompt of a speech recognition application;
  
  a grammar editor module configured to access and modify a grammar based on scoring of a recognition result, the grammar comprising a plurality of concepts, each concept having a set of phrases organized under a single idea, the idea representing an expected response to the first prompt;
  
  a speech recognition engine configured to output a first recognition result for each audio data file of the plurality of digital audio data files and the accessed grammar using the speech recognizer; and
  
  a scoring module configured to score the first recognition results based at least in part on a user-defined transcript of each audio data file of the plurality of audio data files, wherein said speech recognition engine is configured to output a second recognition result for each audio data file of the plurality of digital audio data files based on said modified grammar and wherein said scoring module is further configured to compare the scoring of the first recognition result with the second recognition result for each audio data file and output the first or second recognition result for each audio data file based on said comparison.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
- - 16. The system of claim 15, further comprising a user interface.
  - 17. The system of claim 15, wherein the user interface comprises a graphical user interface.
  - 18. The system of claim 17, wherein the graphical user interface is configured to display an output of the scoring module.
  - 19. The system of claim 17, wherein the graphical user interface is configured to display the digital audio input and the accessed grammar.
  - 20. The system of claim 15, wherein the recognition result comprises a confidence score.
  - 21. The system of claim 15, wherein the recognition result comprises a concept, phrase, word, or phoneme.
  - 22. The system of claim 15, wherein the recognition result comprises an indication of an acoustic model used by the speech recognizer in decoding each audio data file.
  - 23. The system of claim 22, wherein the recognition result comprises an acoustic model score.
  - 24. The system of claim 15, further comprising a response file for logically associating an audio data file, the transcript, the recognition result, and/or an output of the scoring module.
  - 25. The system of claim 15, wherein the speech recognition engine is configured to transmit the recognition result to a tuner module for processing.
  - 26. The system of claim 25, further comprising a tuner module configured to transmit digital audio input to the audio recorder module and grammar to the grammar editor module.
  - 27. The system of claim 15, further comprising a test module configured to initiate a testing cycle by processing and transmitting digital audio input and grammar to the speech recognition engine.
  - 28. The system of claim 27, wherein the speech recognition engine is configured to transmit the recognition result to a tuner module for processing.
  - 29. The system of claim 28, further comprising a tuner module configured to transmit digital audio data and grammar to the test module.
  - 30. The system of claim 15, wherein the system is configured to, iteratively, modify the grammar based on a previous scoring of recognition results using the grammar editor module, output a recognition result for each audio data file based on the modified grammar using the speech recognition engine, and use the user-defined transcript of each audio data file to score the modified grammar recognition results using the scoring module.

31. A system for testing a speech recognizer, the system comprising:
- an audio data input module for receiving a plurality of digital audio data files, each audio data file comprising audio recorded in response to a first prompt from a speech recognition application;
  
  a grammar editor module configured to access and modify a grammar, the grammar comprising a plurality of concepts, each concept having a set of phrases organized under a single idea, the idea representing an expected response to the first prompt;
  
  a test module configured to initiate a first testing cycle, the testing cycle comprising transmitting the plurality of digital audio data files and the grammar to a speech recognition engine; and
  
  a scoring module configured to receive a first recognition result for each of the plurality of audio data files from the speech recognition engine, and further configured to score the first recognition results based at least in part on a user-defined transcript of the audio input,wherein said test module is configured to initiate a second testing cycle comprising transmitting the plurality of digital audio data files and the modified grammar to said speech recognition engine, and wherein said scoring module is configured to receive a recognition result for said second testing cycle for each of the plurality of audio data files based on the modified grammar, from the speech recognition engine and is further configured to compare the scoring of the first recognition result with the second recognition result for each audio data file and output the first or second recognition result for each audio data file based on said comparison.
- View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
- - 32. The system of claim 31, further comprising a speech recognition engine configured to output a recognition result to the scoring module for each of the plurality of audio data files received from the test module.
  - 33. The system of claim 32, wherein the speech recognition engine is further configured to transmit the recognition results to a tuner module for processing.
  - 34. The system of claim 33, further comprising a tuner module configured to transmit digital audio data and grammar to the test module.
  - 35. The system of claim 32, further comprising a user interface.
  - 36. The system of claim 35, wherein the user interface comprises a graphical user interface.
  - 37. The system of claim 36, wherein the graphical user interface is configured to display an output from a the scoring module.
  - 38. The system of claim 36, wherein the graphical user interface is configured to display a digital audio data file and the accessed grammar.
  - 39. The system of claim 32, wherein the recognition result comprises a confidence score.
  - 40. The system of claim 32, wherein the recognition result comprises a concept, phrase, word, or phoneme.
  - 41. The system of claim 40, wherein the recognition result comprises an acoustic model score.
  - 42. The system of claim 32, wherein the recognition result comprises an indication of an acoustic model used by the speech recognition engine in decoding an audio data file.
  - 43. The system of claim 32, further comprising a response file for logically associating an audio data file, the transcript, the recognition result, and/or an output of the module configured to output a recognition result.
  - 44. The system of claim 31, wherein the grammar editor module is further configured to modify the grammar based on the scoring of the recognition results, the test module is further configured to transmit the plurality of audio data files and the modified grammar to a speech recognition engine, and the scoring module is further configured to receive a recognition result based on the modified grammar from the speech recognition engine for each of the plurality of audio data files and to score the recognition results based at least in part on the user-defined transcript.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
LumenVox, LLC.
Original Assignee
LumenVox, LLC.
Inventors
Bergman, Michael D., Blake, James F. II, Miller, Edward S., Danielson, Kyle N., Auckland, Alexandra L., Herold, Keith C.
Primary Examiner(s)
Vo; Huyen X.

Application Number

US10/725,281
Time in Patent Office

1,786 Days
Field of Search

704/231, 704/243, 704/235, 704/244, 704/270, 704/9, 704/270.1, 704/260, 704/246, 704/252
US Class Current

704/244
CPC Class Codes

G10L 15/01   Assessment or evaluation of...

G10L 15/063   Training

G10L 15/193   Formal grammars, e.g. finit...

G10L 2015/0631   Creating reference template...

System and method for tuning and testing in a speech recognition system

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

78 Citations

44 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for tuning and testing in a speech recognition system

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

78 Citations

44 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links