Training speaker-dependent, phrase-based speech grammars using an unsupervised automated technique

US 7,778,830 B2
Filed: 05/19/2004
Issued: 08/17/2010
Est. Priority Date: 05/19/2004
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented method for performing speech recognition, the method comprising:

operating at least one processor programmed to perform;

initiating a communication session with a speaker, the communication session requiring automatic speech recognition (ASR);

determining a characteristic of the speaker, the characteristic selected from a group consisting of a speaker identity and at least one voice characteristic for the speaker;

identifying a speaker-dependent, phrase-based grammar to use in the communication session with the speaker, wherein different speaker-dependent, phrase-based grammars are used for different users based on at least one speaker-dependent feature independent of a gender of the users;

recording feedback of ASR phrase processing operations during the communication session, wherein each ASR phrase processing operation seeks to match a spoken utterance against at least one entry within the identified speaker-dependent, phrase-based grammar, each entry of the at least one entry within said identified speaker-dependent, phrase-based grammar having a plurality of grammar option weights, each of the plurality of grammar option weights corresponding to a respective speech processing context, wherein the grammar option weights affect which entries are matched to the spoken utterances;

automatically adjusting the grammar option weights based upon recorded feedback data for the communication session to improve accuracy of the identified speaker-dependent, phrase-based grammar.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention can include a method for tuning grammar option weights of a phrase-based, automatic speech recognition (ASR) grammar, where the grammar option weights affect which entries within the grammar are matched to spoken utterances. The tuning can occur in an unsupervised fashion, meaning no special training session or manual transcription of data from an ASR session is needed. The method can include the step of selecting a phrase-based grammar to use in a communication session with a user wherein different phrase-based grammars can be selected for different users. Feedback of ASR phrase processing operations can be recorded during the communication session. Each ASR phrase processing operation can match a spoken utterance against at least one entry within the selected phrase-based grammar. At least one of the grammar option weights can be automatically adjusted based upon the feedback to improve accuracy of the phrase-based grammar.

Citations

19 Claims

1. A computer-implemented method for performing speech recognition, the method comprising:
- operating at least one processor programmed to perform;
  
  initiating a communication session with a speaker, the communication session requiring automatic speech recognition (ASR);
  
  determining a characteristic of the speaker, the characteristic selected from a group consisting of a speaker identity and at least one voice characteristic for the speaker;
  
  identifying a speaker-dependent, phrase-based grammar to use in the communication session with the speaker, wherein different speaker-dependent, phrase-based grammars are used for different users based on at least one speaker-dependent feature independent of a gender of the users;
  
  recording feedback of ASR phrase processing operations during the communication session, wherein each ASR phrase processing operation seeks to match a spoken utterance against at least one entry within the identified speaker-dependent, phrase-based grammar, each entry of the at least one entry within said identified speaker-dependent, phrase-based grammar having a plurality of grammar option weights, each of the plurality of grammar option weights corresponding to a respective speech processing context, wherein the grammar option weights affect which entries are matched to the spoken utterances;
  
  automatically adjusting the grammar option weights based upon recorded feedback data for the communication session to improve accuracy of the identified speaker-dependent, phrase-based grammar.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the at least one processor is programmed to perform:
    - selecting one of a plurality of grammars as the grammar based upon an identity of a user that provides the utterances, wherein the selected grammar is utilized exclusively for ASR processing operations involving the user.
  - 3. The method of claim 2, wherein each of said adjusting steps occurs proximate in time and occurs responsive to the ending of the communication session.
  - 4. The method of claim 1, wherein the at least one processor is programmed to perform:
    - identifying vocal characteristics for a user that provides the utterances; and
      
      selecting one of a plurality of grammars as the grammar based upon the vocal characteristics, wherein the selected grammar is utilized by a plurality of different users, each user having the identified vocal characteristics.
  - 5. The method of claim 4, wherein said method is performed periodically in batch, where a batch adjusts grammar option weights for the grammar using feedback recorded during a plurality of communication sessions.
  - 6. The method of claim 1, said adjusting step further comprising:
    - when feedback for the ASR phrase processing operation is positive, adjusting the grammar option weight to increase a likelihood of matching entries in the grammar that are associated with the grammar option weight; and
      
      when feedback for the ASR phrase processing operation is negative, adjusting the grammar option weight to decrease a likelihood of matching entries in the grammar that are associated with the grammar option weight.
  - 7. The method of claim 6, wherein the feedback for at least one ASR phrase processing operation includes at least a portion of an n-best list of phrases, wherein said adjusting step adjusts a plurality of grammar option weights.
  - 8. The method of claim 7, wherein each entry in the n-best list is associated with a score, said method further comprising the steps of:
    - statistically analyzing the scores associated with ordered entries in the n-best list to determine a break point between entries; and
      
      for each entry up to the break point, adjusting a grammar option weight associated with the entry.

9. A machine-readable recording medium having stored thereon, a computer program having a plurality of code sections, said code sections being executable by a machine for causing the machine to perform the steps of:
- initiating a communication session with a speaker, the communication session requiring automatic speech recognition (ASR);
  
  determining a characteristic of the speaker, the characteristic selected from a group consisting of a speaker identity and at least one voice characteristic for the speaker;
  
  identifying a speaker-dependent, phrase-based grammar to use in the communication session with the speaker, wherein different speaker-dependent, phrase-based grammars are used for different users based on at least one speaker-dependent feature independent of a gender of the users;
  
  recording feedback of ASR phrase processing operations during the communication session, wherein each ASR phrase processing operation seeks to match a spoken utterance against at least one entry within the identified speaker-dependent, phrase-based grammar, each entry of the at least one entry within said identified speaker-dependent, phrase-based grammar having a plurality of grammar option weights, each of the plurality of grammar option weights corresponding to a respective speech processing context, wherein the grammar option weights affect which entries are matched to the spoken utterances; and
  
  automatically adjusting the grammar option weights based upon the recorded feedback data for the communication session to improve accuracy of the identified speaker-dependent, phrase-based grammar.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The machine-readable recording medium of claim 9, wherein the feedback includes at least part of an n-best list of ASR matched entries associated with individual utterances processed during the communication session, each ASR matched entry having an associated likelihood score.
  - 11. The machine-readable recording medium of claim 10, further causing the machine to perform the steps of:
    - identifying when one of the individual utterances has been incorrectly matched based upon the feedback; and
      
      responsive to said identifying step, adjusting at least one parameter within the identified phrase-based grammar so that the likelihood score associated with the topmost entry in the n-best list is decreased when the ASR computer program next processes an utterance similar to the incorrectly identified utterance in a session involving the identified phrase-based grammar.
  - 12. The machine-readable recording medium of claim 11, further causing the machine to perform the steps of:
    - determining at least one entry in the n-best list having a likelihood score that is statistically close to the likelihood score associated with the topmost entry; and
      
      responsive to said determining step, adjusting at least one parameter within the identified phrase-based grammar so that each likelihood score associated with each entry determined to be statistically close to the topmost entry is decreased when the ASR computer program next processes an utterance similar to the incorrectly identified utterance in a session involving the identified phrase-based grammar.
  - 13. The machine-readable recording medium of claim 10, further causing the machine to perform the steps of:
    - identifying when one of the individual utterances has been correctly matched based upon the feedback; and
      
      responsive to said identifying step, adjusting a parameter within the identified phrase-based grammar so that the likelihood score associated with the topmost entry in the n-best list is increased when the ASR computer program next processes an utterance similar to the correctly identified phrase in a session involving the identified phrase-based grammar.
  - 14. The machine-readable recording medium of claim 13, further causing the machine to perform the steps of:
    - determining at least one entry in the n-best list having a likelihood score that is statistically close to the likelihood score associated with the topmost entry; and
      
      responsive to said determining step, adjusting at least one parameter within the identified phrase-based grammar so that each likelihood score associated with each entry determined to be statistically close to the topmost entry is increased when the ASR computer program next processes an utterance similar to the correctly identified utterance in a session involving the identified phrase-based grammar.

15. A computer-implemented system for performing speech recognition, the system comprising:
- at least one computer programmed to;
  
  initiate a communication session with a speaker, the communication session requiring automatic speech recognition (ASR); and
  
  determine a characteristic of the speaker, the characteristic selected from a group consisting of a speaker identity and at least one voice characteristic for the speaker; and
  
  an identification unit configured to identify a speaker-dependent phrase-based ASR grammar to use in the communication session, wherein different phrase-based grammars are used for different users based on at least one speaker-dependent feature independent of a gender of the users;
  
  an information collection unit configured to record feedback in real-time of ASR phrase processing operations during the communication session, wherein each ASR phrase processing operation seeks to match a spoken utterance against at least one entry within the identified speaker dependent, phrase-based grammar, each entry of the at least one entry within said identified speaker dependent, phrase-based grammar having a plurality of grammar option weights, each of the plurality of grammar option weights corresponding to a respective speech processing context, wherein the grammar option weights affect which entries are matched to the spoken utterances; and
  
  a logic unit configured to utilize said recorded feedback to automatically adjust the grammar option weights of the ASR grammar to improve accuracy of the identified speaker dependent, phrase-based grammar.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The system of claim 15, wherein the feedback gathered by the information collection unit for each ASR processed phrase comprises:
    - a plurality of possible matching entries determined by the ASR system; and
      
      for each possible matching entry, a likelihood score that indicates the likelihood of the associated possible matching phrase being an accurate textual representation of an utterance.
  - 17. The system of claim 15, wherein the logic unit adjusts the ASR grammar to affect a plurality of possible matching entries responsive to a single ASR processed utterance.
  - 18. The system of claim 15, wherein when an utterance has been correctly processed, at least one parameter in the ASR grammar is adjusted to increase a likelihood that the ASR system processes phrases in a similar fashion in future ASR operations involving the ASR grammar, and when an utterance has been incorrectly processed, at least one parameter in the ASR grammar is adjusted to decrease a likelihood that the ASR system processes phrases in a similar fashion in future ASR operations involving the ASR grammar.
  - 19. The system of claim 15, wherein the at least one computer is further programmed to implement the identification unit, the information collection unit, and the logic unit.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation, Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Davis, Brent L., Jaiswal, Peeyush, Wang, Fang
Primary Examiner(s)
Vo; Huyen X.

Application Number

US10/849,629
Publication Number

US 20050261901A1
Time in Patent Office

2,281 Days
Field of Search

704/255, 704/257, 704/275, 704/270, 704/231, 704/246, 704/250, 704/243, 704/270.1, 704/235, 704/4, 704/7
US Class Current

704/235
CPC Class Codes

G10L 15/183 using context dependencies,...

G10L 15/19 Grammatical context, e.g. d...

Training speaker-dependent, phrase-based speech grammars using an unsupervised automated technique

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Training speaker-dependent, phrase-based speech grammars using an unsupervised automated technique

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links