Methods and Apparatus for Use in Speech Recognition Systems for Identifying Unknown Words and for Adding Previously Unknown Words to Vocabularies and Grammars of Speech Recognition Systems

US 20080270136A1
Filed: 06/05/2008
Published: 10/30/2008
Est. Priority Date: 11/30/2005
Status: Active Grant

First Claim

Patent Images

1. A signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital processing apparatus of a computer system to perform speech recognition operations, the speech recognition operations comprising:

detecting at least a target word known to an acoustic vocabulary but unknown to an embedded grammar of a language model of the speech recognition system;

assigning a language model probability to the target word;

calculating a sum of an acoustic and language model confidence score for the target word and words already included in the embedded grammar of the language model; and

if the sum of the acoustic and language model probability for the target word is greater than the sum of the acoustic and language model probability for the words already included in the embedded grammar, adding the target word to the language model.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention concerns methods and apparatus for identifying and assigning meaning to words not recognized by a vocabulary or grammar of a speech recognition system. In an embodiment of the invention, the word may be in an acoustic vocabulary of the speech recognition system, but may be unrecognized by an embedded grammar of a language model of the speech recognition system. In another embodiment of the invention, the word may not be recognized by any vocabulary associated with the speech recognition system. In embodiments of the invention, at least one hypothesis is generated for an utterance not recognized by the speech recognition system. If the at least one hypothesis meets at least one predetermined criterion, a sword or more corresponding to the at least one hypothesis is added to the vocabulary of the speech recognition system. In other embodiments of the invention, before adding the word to the vocabulary of the speech recognition system, the at least one hypothesis may be presented to the user of the speech recognition system to determine if that is what the used intended when the user spoke.

81 Citations

View as Search Results

21 Claims

1. A signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital processing apparatus of a computer system to perform speech recognition operations, the speech recognition operations comprising:
- detecting at least a target word known to an acoustic vocabulary but unknown to an embedded grammar of a language model of the speech recognition system;
  
  assigning a language model probability to the target word;
  
  calculating a sum of an acoustic and language model confidence score for the target word and words already included in the embedded grammar of the language model; and
  
  if the sum of the acoustic and language model probability for the target word is greater than the sum of the acoustic and language model probability for the words already included in the embedded grammar, adding the target word to the language model.
- View Dependent Claims (2, 3, 4)
- - 2. The signal-bearing medium of claim 1 where the operations further comprise:
    - after calculating the sum and prior to adding the target word to the embedded grammar of the language model, asking confirmation of the target word from a user of the speech recognition system; and
      
      receiving confirmation for the target word from the user of the speech recognition system.
  - 3. The signal-bearing medium of claim 2 wherein confirmation comprises confirmation of the spelling of the target word.
  - 4. The signal-bearing medium of claim 2 wherein confirmation comprises confirmation of the pronunciation of the target word.

5. A signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital processing apparatus of a computer system to perform speech recognition operations, the speech recognition operations comprising:
- detecting an utterance having a low acoustic score within an acoustic vocabulary of the speech recognition system indicating that the utterance may correspond to an out-of-vocabulary word;
  
  generating at least one new word hypothesis comprised of at least one of a phone- or syllable sequence using confidence scores derived from probabilities contained in a database of viable phone and syllable sequences; and
  
  if the at least one new word hypothesis meets a pre-determined criterion, adding a word corresponding to the at least one new word hypothesis to the vocabulary of the speech recognition system.
- View Dependent Claims (6, 7, 8)
- - 6. The signal-bearing medium of claim 5 wherein the pre-determined criterion corresponds to confirmation by a user of the speech recognition system wherein the operations further comprise:
    - prior to adding at least one word to the acoustic vocabulary of the speech recognition system, presenting the new word hypothesis to a user of the speech recognition system seeking confirmation that the new word hypothesis corresponds to at least one word intended by the user when the user spoke; and
      
      whereby the new word is added to the vocabulary of the speech recognition system only if confirmation is receiving from the user.
  - 7. The signal-bearing medium of claim 6 wherein the utterance corresponds to a multi-word command, and wherein the operations further comprise:
    - adding the command to an embedded grammar of a language model associated with the speech recognition system.
  - 8. The signal-bearing medium of claim 7 wherein the operations further comprise:
    - adding information received from a user of the speech recognition system to memory indicating at least one action to be performed when the command is detected by the speech recognition system.

9. A signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital processing apparatus of a computer system to perform speech recognition operations in a speech recognition system, the speech recognition operations comprising:
- detecting an utterance not recognized by at least a first one of an acoustic vocabulary, embedded grammar, and viable phone/syllable sequence library of the speech recognition system;
  
  generating at least one hypothesis for the utterance, wherein the hypothesis is based on information derived from a second one of an acoustic vocabulary, embedded grammar and viable phone/syllable sequence library of the speech recognition system;
  
  calculating a confidence score for the at least one hypothesis and for members of the first one of the acoustic vocabulary, embedded grammar and viable phone/syllable sequence library of the speech recognition system;
  
  comparing the confidence scores calculated for the at least one hypothesis and for members of the first one of the acoustic vocabulary, embedded grammar and viable phone/syllable sequence library of the speech recognition system; and
  
  adding information to the first one of an acoustic vocabulary, embedded grammar and viable phone/syllable sequence corresponding to the hypothesis if a pre-determined criterion based on the comparison is met.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The signal-bearing medium of claim 9 wherein the utterance corresponds to a phone sequence, and wherein the first one of the acoustic vocabulary, embedded grammar and viable phone/syllable sequence library corresponds to a particular viable phone/syllable sequence library.
  - 11. The signal-bearing medium of claim 9 wherein the utterance corresponds to a word, and wherein the first one of the acoustic vocabulary, embedded grammar and viable phone/syllable sequence library corresponds to a particular acoustic vocabulary.
  - 12. The signal-bearing medium of claim 9 wherein the utterance corresponds to a command, and wherein the first one of the acoustic vocabulary, embedded grammar and viable phone/syllable sequence library corresponds to a particular embedded grammar.
  - 13. The signal-bearing medium of claim 9 wherein the at least one criterion corresponds to confirmation by a user of the speech recognition system, wherein the operations further comprise:
    - prior to adding information corresponding to the at least one hypothesis to the first one of the acoustic vocabulary, embedded grammar and viable phone/syllable sequence library of the speech recognition system, seeking confirmation that the hypothesis corresponds to what the user intended when the user spoke; and
      
      whereby the information is added only if confirmation is received from the user of the speech recognition system.
  - 14. The signal-bearing medium of claim 9 wherein the operations further comprise:
    - using biometric information to assist in identifying the utterance as unrecognized by the first one of the acoustic vocabulary, embedded grammar and viable phone/syllable sequence library of the speech recognition system.
  - 15. The signal signal-bearing medium of claim 14 wherein the biometric information comprises speech biometric information.
  - 16. The signal-bearing medium of claim 14 wherein the biometric information comprises data derived from video information.

17. A speech recognition system comprising:
- a speech input for receiving speech from a user of the speech recognition system;
  
  an open set comprised of at least one open vocabulary and at least one open embedded grammar associated with a language model implemented in the speech recognition system;
  
  a hierarchical mapping system for identifying utterances not recognized by at least one of the open vocabulary and open embedded grammar of the speech recognition system;
  
  for generating hypotheses for the unrecognized utterances using confidence scores based at least in part on one of viable phone/syllable sequence information, acoustic vocabulary information and grammar information; and
  
  for adding information corresponding to the hypotheses to at least one of the open vocabulary and embedded grammar of the speech recognition system if a pre-determined criterion is met; and
  
  a confidence score system for generating confidence scores for use by the hierarchical mapping system.
- View Dependent Claims (18, 19, 20, 21)
- - 18. The speech recognition system of claim 17 further comprising:
    - a user behavior biometrics detector for generating data to assist the hierarchical mapping system in identifying utterances that a user expects not to be recognized by the speech recognition system.
  - 19. The speech recognition system of claim 17 further comprising:
    - a confirmation system for providing the hypotheses corresponding to the unrecognized utterances to a user of the speech recognition system, and for receiving confirmation from the user if the hypotheses correspond to what the user intended when the user spoke the unrecognized utterances.
  - 20. The speech recognition system of claim 17 further comprising:
    - a user input system for receiving data from the user of the speech recognition system, wherein the data is associated with the information corresponding to the hypotheses added to at least one of the open acoustic vocabulary and open embedded grammar of the speech recognition system when a pre-determined criterion is met.
  - 21. The speech recognition system of claim 17 wherein the data concerns at least one action to be performed.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Viswanathan, Mahesh, Kanevsky, Dimitri, Gopinath, Ramesh A., Deligne, Sabine

Granted Patent

US 9,754,586 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/257
CPC Class Codes

G10L 15/063   Training

G10L 15/183   using context dependencies,...

G10L 15/19   Grammatical context, e.g. d...

G10L 2015/0631   Creating reference template...

Methods and Apparatus for Use in Speech Recognition Systems for Identifying Unknown Words and for Adding Previously Unknown Words to Vocabularies and Grammars of Speech Recognition Systems

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

81 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and Apparatus for Use in Speech Recognition Systems for Identifying Unknown Words and for Adding Previously Unknown Words to Vocabularies and Grammars of Speech Recognition Systems

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

81 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links