Multiple pass speech recognition method and system

US 7,184,957 B2
Filed: 10/10/2002
Issued: 02/27/2007
Est. Priority Date: 09/25/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A method of recognizing speech, the method comprising:

receiving an input speech signal;

performing an initial recognition on the input speech signal to generate a first pass result;

generating a first grammar based upon the first pass result, the first grammar having a portion set to match a first part of the input speech signal; and

applying the first grammar to the input speech signal to generate a second pass result, wherein generating a first grammar comprises;

determining a context of the first pass result;

determining the portion of the first grammar to be set to match the first part of the input speech signal based upon the determined context of the first pass result; and

generating the first grammar with the portion set to match the first part of the input speech signal.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A multiple pass speech recognition method includes a first pass and a second pass. The first pass recognizes an input speech signal to generate a first pass result. The second pass generates a first grammar having a portion set to match a first part of the input speech signal, based upon the context of the first pass result, and generate a second pass result. The method may further include a third pass grammar limiting the second part of the input speech signal to the second pass result. The third pass grammar includes a model corresponding to the first part of the input speech signal and varying within the second pass result. The third pass compares the first part of the input speech signal to the model while limiting the second part of the input speech signal to the second pass result.

139 Citations

35 Claims

1. A method of recognizing speech, the method comprising:
- receiving an input speech signal;
  
  performing an initial recognition on the input speech signal to generate a first pass result;
  
  generating a first grammar based upon the first pass result, the first grammar having a portion set to match a first part of the input speech signal; and
  
  applying the first grammar to the input speech signal to generate a second pass result, wherein generating a first grammar comprises;
  
  determining a context of the first pass result;
  
  determining the portion of the first grammar to be set to match the first part of the input speech signal based upon the determined context of the first pass result; and
  
  generating the first grammar with the portion set to match the first part of the input speech signal.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein applying the first grammar comprises;
    - setting the first part of the input speech signal as matched with the portion of the first grammar;
      
      recognizing a second part of the input speech signal;
      
      generating the second pass result based upon the recognized second part of the input speech signal.
  - 3. The method of claim 2, wherein the second pass result is modified based upon location-based information.
  - 4. The method of claim 2, wherein the first part of the input speech signal corresponds to a street address and the second part of the input speech signal corresponds to a city name.

5. A method of recognizing speech, the method comprising:
- receiving an input speech signal;
  
  performing an initial recognition on the input speech signal to generate a first pass result;
  
  generating a first grammar based upon the first pass result, the first grammar having a portion set to match a first part of the input speech signal;
  
  applying the first grammar to the input speech signal to generate a second pass result;
  
  generating a second grammar based upon the second pass result, the second grammar limiting the second part of the input speech signal to the second pass result and configured to recognize the first part of the input speech signal within the second pass result; and
  
  applying the second grammar to the input speech signal to generate a third pass result.
- View Dependent Claims (6, 7, 8)
- - 6. The method of claim 5, wherein generating a second grammar comprises:
    - limiting the second part of the input speech signal to the second pass result; and
      
      generating a model corresponding to the first part of the input speech signal and varying within the second pass result.
  - 7. The method of claim 6, wherein applying the second grammar comprises comparing the first part of the input speech signal to the model while the second part of the input speech signal is limited to the second pass result.
  - 8. The method of claim 5, wherein the third pass result is modified based upon location-based information.

9. A computer program product for recognizing speech, the computer program product stored on a computer readable medium and adapted to perform a method comprising:
- receiving an input speech signal;
  
  performing an initial recognition on the input speech signal to generate a first pass result;
  
  generating a first grammar based upon the first pass result, the first grammar having a portion set to match a first part of the input speech signal; and
  
  applying the first grammar to the input speech signal to generate a second pass result,wherein generating a first grammar comprises;
  
  determining a context of the first pass result;
  
  determining the portion of the first grammar to be set to match the first part of the input speech signal based upon the determined context of the first pass result; and
  
  generating the first grammar with the portion set to match the first part of the input speech signal.
- View Dependent Claims (10, 11, 12)
- - 10. The computer program product of claim 9, wherein applying the first grammar comprises:
    - setting the first part of the input speech signal as matched with the portion of the first grammar;
      
      recognizing a second part of the input speech signal; and
      
      generating the second pass result based upon the recognized second part of the input speech signal.
  - 11. The computer program product of claim 10, wherein the second pass result is modified based upon location-based information.
  - 12. The computer program product of claim 10, wherein the first part of the input speech signal corresponds to a street address and the second part of the input speech signal corresponds to a city name.

13. A computer program product for recognizing speech, the computer program product stored on a computer readable medium and adapted to perform a method comprising:
- receiving an input speech signal;
  
  performing an initial recognition on the input speech signal to generate a first pass result;
  
  generating a first grammar based upon the first pass result, the first grammar having a portion set to match a first part of the input speech signal; and
  
  applying the first grammar to the input speech signal to generate a second pass result;
  
  generating a second grammar based upon the second pass result, the second grammar limiting the second part of the input speech signal to the second pass result and configured to recognize the first part of the input speech signal within the second pass result; and
  
  applying the second grammar to the input speech signal to generate a third pass result.
- View Dependent Claims (14, 15, 16)
- - 14. The computer program product of claim 13, wherein generating a second grammar comprises:
    - limiting the second part of the input speech signal to the second pass result; and
      
      generating a model corresponding to the first part of the input speech signal and varying within the second pass result.
  - 15. The computer program product of claim 14, wherein applying the second grammar comprises comparing the first part of the input speech signal to the model while the second part of the input speech signal is limited to the second pass result.
  - 16. The computer program product of claim 13, wherein the third pass result is modified based upon location-based information.

17. A speech recognition system using a multiple pass speech recognition method including at least a first pass and a second pass, the speech recognition system comprisinga speech recognition engine for performing an initial recognition on an input speech signal in the first pass to generate a first pass result and applying a first grammar to the input speech signal in the second pass to generate a second pass result;
- a grammar database for storing a plurality of grammar; and
  
  a dynamic grammar generator for generating the first grammar based upon the first pass result using the grammar stored in the grammar database, the first grammar having a portion set to match a first part of the input speech signal and configured to recognize a second part of the input speech signal,wherein the dynamic grammar generator determines a context of the first pass result and determines the portion of the first grammar to be set to match the first part of the input speech signal based upon the determined context of the first pass result.
- View Dependent Claims (18, 19, 20, 21)
- - 18. The speech recognition system of claim 17, further comprising a processor coupled to the speech recognition engine and configured to modify the second pass result based upon location-based information.
  - 19. The speech recognition system of claim 17, wherein the first part of the input speech signal corresponds to a street address and the second part of the input speech signal corresponds to a city name.
  - 20. The speech recognition system of claim 17, wherein the speech recognition system is networked and includes a server and a client, the client comprising the speech buffer and the server comprising the speech recognition engine, the dynamic grammar generator, and the grammar database.
  - 21. The speech recognition system of claim 17, wherein the speech recognition system is networked and includes a server and a client, the client comprising the speech buffer and the speech recognition engine, and the server comprising the dynamic grammar generator, and the grammar database.

22. A speech recognition system using a multiple pass speech recognition method including at least a first pass and a second pass, the speech recognition system comprising:
- a speech recognition engine for performing an initial recognition on an input speech signal in the first pass to generate a first pass result and applying a first grammar to the input speech signal in the second pass to generate a second pass result;
  
  a grammar database for storing a plurality of grammar; and
  
  a dynamic grammar generator for generating the first grammar based upon the first pass result using the grammar stored in the grammar database, the first grammar having a portion set to match a first part of the input speech signal and configured to recognize a second part of the input speech signal,wherein the multiple pass speech recognition method further comprises a third pass,the dynamic grammar generator generating a second grammar based upon the second pass result, the second grammar limiting the second part of the speech to the second pass result and configured to recognize the first part of the input speech signal within the second pass result; and
  
  the speech recognition engine applying the second grammar to the input speech signal to generate a third pass result.
- View Dependent Claims (23, 24, 25)
- - 23. The speech recognition system of claim 22, wherein the dynamic grammar generator limits the second part of the input speech signal to the second pass result and generates a model corresponding to the first part of the input speech signal and varying within the second pass result as part of the second grammar.
  - 24. The speech recognition system of claim 23, wherein the speech recognition engine applies the third pass grammar to the input speech signal in the third pass by comparing the first part of the input speech signal to the model while limiting the second part of the input speech signal to the second pass result.
  - 25. The speech recognition system of claim 22, further comprising a processor coupled to the speech recognition engine configured to modify the third pass result based upon location-based information.

26. A method of recognizing speech, the method comprising:
- receiving an input speech signal;
  
  performing an initial recognition on the input speech signal to generate a first pass result;
  
  determining a level of the first pass result in a knowledge hierarchy; and
  
  generating a first grammar having a level higher in the knowledge hierarchy than the level of the first pass result, the second pass grammar having a portion set to match a first part of the input speech signal; and
  
  applying the first grammar to the input speech signal to generate a second pass result.
- View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34)
- - 27. The method of claim 26, wherein generating a first grammar comprises:
    - determining the portion of the first grammar to be set to match the first part of the input speech signal based upon the determined level of the first pass result; and
      
      generating the first grammar having the portion set to match the first part of the input speech signal.
  - 28. The method of claim 26, wherein applying the first grammar comprises:
    - setting the first part of the input speech signal as matched with the portion of the first grammar;
      
      recognizing a second part of the input speech signal; and
      
      generating the second pass result based upon the recognized second part of the input speech signal.
  - 29. The method of claim 28, wherein the first part of the input speech signal corresponds to a street address and the second part of the input speech signal corresponds to a city name.
  - 30. The method of claim 26, wherein the second pass result is modified based upon location-based information.
  - 31. The method of claim 26, further comprising:
    - generating a second grammar based upon the second pass result, the second grammar having a level lower in the knowledge hierarchy than both the level of the second pass result and the level of the first grammar, the second grammar limiting the second part of the input speech signal to the second pass result and configured to recognize the first part of the input speech signal within the second pass result; and
      
      applying the second grammar to the input speech signal to generate a third pass result.
  - 32. The method of claim 31, wherein generating a second grammar comprises:
    - limiting the second part of the input speech signal to the second pass result;
      
      generating a model corresponding to the first part of the input speech signal and varying within the second pass result.
  - 33. The method of claim 31, wherein applying the second grammar comprises comparing the first part of the input speech signal to the model while the second part of the input speech signal is limited to the second pass result.
  - 34. The method of claim 31, wherein the third pass result is modified based upon location-based information.

35. A server for use in a networked speech recognition system using a multiple pass speech recognition method including at least a first pass and a second pass for recognition of an input speech signal, the server comprising:
- a grammar database for storing a plurality of grammar; and
  
  a dynamic grammar generator for generating a first grammar based upon a result of the first pass using the grammar stored in the grammar database, the first grammar having a portion set to match a first part of the input speech signal and configured to recognize a second part of the input speech signal,wherein the dynamic grammar generator further generates a second grammar based upon a result of the second pass, the second grammar limiting the second part of the input speech signal to the result of the second pass and configured to recognize the first part of the input speech signal within the result of the second pass result.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Toyota Infotechnology Center Co., Ltd. (Toyota Motor Corporation)
Original Assignee
Toyota Infotechnology Center Co., Ltd. (Toyota Motor Corporation)
Inventors
Endo, Norikazu, Brookes, John R.
Primary Examiner(s)
ABEBE, DANIEL DEMELASH

Application Number

US10/269,269
Publication Number

US 20040059575A1
Time in Patent Office

1,601 Days
Field of Search

704/246, 704/275
US Class Current

704/246
CPC Class Codes

G10L 15/08 Speech classification or se...

G10L 15/183 using context dependencies,...

Multiple pass speech recognition method and system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

139 Citations

35 Claims

Specification

Use Cases

Quick Links

Others

Multiple pass speech recognition method and system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

139 Citations

35 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others