Method and apparatus of specifying and performing speech recognition operations

US 7,720,683 B1
Filed: 06/10/2004
Issued: 05/18/2010
Est. Priority Date: 06/13/2003
Status: Active Grant

First Claim

Patent Images

1. A method of specifying a speech recognition operation comprising:

receiving, on at least one computer, a recognition set from a user, the recognition set comprising one or more text words or phrases to be recognized;

automatically generating a plurality of alternate phonetic representations of each word or phrase in the recognition set;

displaying the phonetic representations to the user in a graphical user interface;

generating a plurality of speech recognition parameters for the recognition set based on said phonetic representations;

calculating, on at least one computer, an estimate of the resources used by a target system to recognize the words or phrases in the recognition set using the speech recognition parameters;

displaying the estimate to the user in the graphical user interface;

interactively modifying the phonetic representations, and in accordance therewith, modifying the speech recognition parameters, wherein the resources used by the target system are modified in accordance with the interactive modification of the phonetic representations; and

redisplaying the estimate as the phonetic representations are modified.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition technique is described that has the dual benefits of not requiring collection of recordings for training while using computational resources that are cost-compatible with consumer electronic products. Methods are described for improving the recognition accuracy of a recognizer by developer interaction with a design tool that iterates the recognition data during development of a recognition set of utterances and that allows controlling and minimizing the computational resources required to implement the recognizer in hardware.

748 Citations

31 Claims

1. A method of specifying a speech recognition operation comprising:
- receiving, on at least one computer, a recognition set from a user, the recognition set comprising one or more text words or phrases to be recognized;
  
  automatically generating a plurality of alternate phonetic representations of each word or phrase in the recognition set;
  
  displaying the phonetic representations to the user in a graphical user interface;
  
  generating a plurality of speech recognition parameters for the recognition set based on said phonetic representations;
  
  calculating, on at least one computer, an estimate of the resources used by a target system to recognize the words or phrases in the recognition set using the speech recognition parameters;
  
  displaying the estimate to the user in the graphical user interface;
  
  interactively modifying the phonetic representations, and in accordance therewith, modifying the speech recognition parameters, wherein the resources used by the target system are modified in accordance with the interactive modification of the phonetic representations; and
  
  redisplaying the estimate as the phonetic representations are modified.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1 wherein the phonetic representations are displayed on a keyboard.
  - 3. The method of claim 1 further comprising performing a speech recognition operation on a local computer based on the speech recognition parameters.
  - 4. The method of claim 1 wherein the speech recognition parameters comprise a first and second set of recognition parameters, wherein the first set configures a speech recognition system to respond to portions of words or phrases in the recognition set and produce a first set of intermediate results, and the second set configures the speech recognition system to analyze the intermediate results and produce a final result.
  - 5. The method of claim 1 further comprising transferring the speech recognition parameters to the target system to configure the target system to perform a speech recognition operation.
  - 6. The method of claim 5 wherein the target system includes a likelihood estimator and the speech recognition parameters include an acoustic model transferred to the likelihood estimator.
  - 7. The method of claim 6 wherein the acoustic model includes neural network weights.
  - 8. The method of claim 5 wherein the target system includes a grammar analyzer and the speech recognition parameters include a grammar specification data file transferred to the grammar analyzer.
  - 9. The method of claim 8 wherein the grammar specification data file includes instructions for configuring a search algorithm on the target system to analyze acoustic information against all words or phrases in the recognition set over a given time interval.
  - 10. The method of claim 1 further comprising generating synthesized audio corresponding to the phonetic representations so that the user may interactively modify the phonetic representations and improve recognition accuracy.

11. A method of making a speech recognition device comprising:
- receiving, on at least one computer, a recognition set from a user, the recognition set comprising one or more text words or phrases to be recognized;
  
  automatically generating a plurality of alternate phonetic representations of each word or phrase in the recognition set;
  
  displaying the phonetic representations to the user in a graphical user interface;
  
  generating a plurality of speech recognition parameters for the recognition set based on said phonetic representations;
  
  calculating, on at least one computer, an estimate of the resources used by said speech recognition device to recognize the words or phrases in the recognition set using the speech recognition parameters;
  
  displaying the estimate to the user in the graphical user interface;
  
  interactively modifying the phonetic representations, and in accordance therewith, modifying the speech recognition parameters, wherein the resources used by the speech recognition device are modified in accordance with the interactive modification of the symbolic representations;
  
  redisplaying the estimate as the phonetic representations are modified; and
  
  storing the speech recognition parameters in a memory of the speech recognition device.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The method of claim 11 wherein the speech recognition parameters comprise a first and second set of recognition parameters, wherein the first set configures a speech recognition system to respond to portions of words or phrases in the recognition set and produce a first set of intermediate results, and the second set configures the speech recognition system to analyze the intermediate results and produce a final result.
  - 13. The method of claim 11 wherein the speech recognition device includes a likelihood estimator and the speech recognition parameters include an acoustic model accessed by the likelihood estimator.
  - 14. The method of claim 13 wherein the acoustic model comprises neural network weights.
  - 15. The method of claim 11 wherein the speech recognition device includes a grammar analyzer and the speech recognition parameters include a grammar specification data file accessed by the grammar analyzer.
  - 16. The method of claim 15 wherein the grammar specification data file includes instructions for configuring a search algorithm on the speech recognition device to analyze acoustic information against all the words or phrases in the recognition set over a given time interval.
  - 17. The method of claim 11 further comprising generating synthesized audio corresponding to the phonetic representations so that the user may interactively modify the phonetic representations and improve recognition accuracy.

18. A computer-readable storage medium including software for performing a method, the method comprising:
- receiving a recognition set from a user, the recognition set comprising one or more text words or phrases to be recognized;
  
  automatically generating a plurality of alternate phonetic representations of each word or phrase in the recognition set;
  
  displaying the phonetic representations to the user in a graphical user interface;
  
  generating a plurality of speech recognition parameters for the recognition set based on said phonetic representations;
  
  calculating an estimate of the resources used by a speech recognition device to recognize the words or phrases in the recognition set using the speech recognition parameters;
  
  displaying the estimate to the user in the graphical user interface;
  
  interactively modifying the phonetic representations, and in accordance therewith, modifying the speech recognition parameters, wherein the resources used by the speech recognition device are modified in accordance with the interactive modification of the symbolic representations; and
  
  redisplaying the estimate as the phonetic representations are modified.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
- - 19. The method of claim 18 wherein the phonetic representations are displayed on a keyboard.
  - 20. The method of claim 18 further comprising generating synthesized audio corresponding to the phonetic representations.
  - 21. The method of claim 18 further comprising generating a plurality of alternate phonetic representations for a first word or phrase in the recognition set and corresponding speech recognition parameters for recognizing each of the plurality of alternate phonetic representations of the first word or phrase.
  - 22. The method of claim 18 wherein the speech recognition parameters include an acoustic model.
  - 23. The method of claim 18 wherein the speech recognition parameters include an acoustic model comprising instructions for programming a recognizer to respond to words or phrases in the recognition set at particular instances of time.
  - 24. The method of claim 18 wherein the speech recognition parameters include a grammar specification data file comprising instructions for programming a recognizer to analyze acoustic information against all the words or phrases in the recognition set over a given time interval.
  - 25. The method of claim 18 wherein the speech recognition parameters include instructions for determining when the end of speech is detected by the recognizer.
  - 26. The method of claim 18 wherein the speech recognition parameters include matching criteria for matching an input speech signal to the words or phrases in the recognition set.
  - 27. The method of claim 18 wherein the speech recognition parameters include matching sensitivity for modifying the recognition parameters to allow for an easier or more difficult match of surrounding out-of-vocabulary words.
  - 28. The method of claim 18 wherein the speech recognition parameters include out of vocabulary sensitivity for modifying sensitivity of an out-of-vocabulary determination.
  - 29. The method of claim 18 further comprising batch testing recognition on a local computer based on the speech recognition parameters.
  - 30. The method of claim 18 further comprising configuring a speech recognition system with the speech recognition parameters.
  - 31. The method of claim 30 further comprising storing the recognition parameters in a memory coupled to the speech recognition system.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sensory Incorporated
Original Assignee
Sensory Incorporated
Inventors
Savoie, Robert E., Sutton, Stephen, Vermeulen, Pieter J., Mozer, Forrest S.
Primary Examiner(s)
Vo; Huyen X.

Application Number

US10/866,232
Time in Patent Office

2,168 Days
Field of Search

704/232, 704/270, 704/254, 704/235, 704/231, 704/242, 704/220, 704/257, 704/244, 704/243, 704/255, 704/272, 704/270.1, 345/171
US Class Current

704/254
CPC Class Codes

G10L 15/22   Procedures used during a sp...

G10L 15/285   Memory allocation or algori...

G10L 2015/0631   Creating reference template...

Method and apparatus of specifying and performing speech recognition operations

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

748 Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus of specifying and performing speech recognition operations

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

748 Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links