System and method enabling acoustic barge-in

US 7,392,188 B2
Filed: 07/31/2003
Issued: 06/24/2008
Est. Priority Date: 07/31/2003
Status: Active Grant

First Claim

Patent Images

1. A method of suppressing speech recognition errors in a speech recognition system, said method comprising the steps of:

receiving an input signal that comprises at least one user-generated command word and an echo from an outgoing system voice prompt, wherein at least one word of the outgoing system voice prompt is included in the echo received in the input signal;

generating an acoustic model of the outgoing system voice prompt, said acoustic prompt model mathematically representing the words of the outgoing system voice prompt;

supplying the input signal to a speech recognizer having an acoustic model of a target vocabulary, said acoustic target vocabulary model mathematically representing at least one user-generated command word;

comparing the input signal to the acoustic prompt model and to the acoustic target vocabulary model;

determining which of the acoustic prompt model and the acoustic target vocabulary model provides a best match for the input signal during the comparing step;

accepting the best match if the acoustic target vocabulary model provides the best match; and

ignoring the best match if the acoustic prompt model provides the best match.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method enabling acoustic barge-in during a voice prompt in a communication system. An acoustic prompt model is trained to represent the system prompt using the specific speech signal of the prompt. The acoustic prompt model is utilized in a speech recognizer in parallel with the recognizer'"'"'s active vocabulary words to suppress the echo of the prompt within the recognizer. The speech recognizer may also use a silence model and traditional garbage models such as noise models and out-of-vocabulary word models to reduce the likelihood that noises and out-of-vocabulary words in the user utterance will be mapped erroneously onto active vocabulary words.

Citations

22 Claims

1. A method of suppressing speech recognition errors in a speech recognition system, said method comprising the steps of:
- receiving an input signal that comprises at least one user-generated command word and an echo from an outgoing system voice prompt, wherein at least one word of the outgoing system voice prompt is included in the echo received in the input signal;
  
  generating an acoustic model of the outgoing system voice prompt, said acoustic prompt model mathematically representing the words of the outgoing system voice prompt;
  
  supplying the input signal to a speech recognizer having an acoustic model of a target vocabulary, said acoustic target vocabulary model mathematically representing at least one user-generated command word;
  
  comparing the input signal to the acoustic prompt model and to the acoustic target vocabulary model;
  
  determining which of the acoustic prompt model and the acoustic target vocabulary model provides a best match for the input signal during the comparing step;
  
  accepting the best match if the acoustic target vocabulary model provides the best match; and
  
  ignoring the best match if the acoustic prompt model provides the best match.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the step of generating an acoustic model of the outgoing system voice prompt is performed in advance of the comparing step and includes the steps of:
    - determining phonetic units utilized in the words of the outgoing system voice prompt;
      
      storing the phonetic units in a phonetic unit database accessible by the speech recognizer;
      
      providing the speech recognizer with an orthographic text of the outgoing system voice prompt prior to playing the prompt; and
      
      building the prompt model by the speech recognizer, said speech recognizer selecting and concatenating appropriate phonetic units based on the orthographic text of the outgoing system voice prompt.
  - 3. The method of claim 2, wherein a plurality of outgoing system voice prompts are stored in a system prompt database accessible by a prompt server that plays selected prompts, and phonetic units associated with the words of the plurality of outgoing system voice prompts are stored in the phonetic unit database, and wherein the method further comprises, prior to supplying the input signal to the speech recognizer, the steps of:
    - instructing the prompt server to select and play a selected outgoing system voice prompt;
      
      informing the speech recognizer which outgoing system voice prompt is going to be played; and
      
      retrieving by the speech recognizer, phonetic units from the phonetic unit database that are appropriate for an acoustic prompt model corresponding to the selected outgoing system voice prompt.
  - 4. The method of claim 1, wherein the step of generating an acoustic model of the outgoing system voice prompt includes the steps of:
    - sending the speech signal of the outgoing system voice prompt to the speech recognizer; and
      
      generating the acoustic prompt model from the speech signal immediately before the comparing step.
  - 5. The method of claim 1, wherein the step of generating an acoustic model of the outgoing system voice prompt includes generating the acoustic prompt model at an attenuation level of approximately 20 dB relative to the outgoing system voice prompt.
  - 6. The method of claim 1, further comprising the steps of:
    - comparing the input signal to a silence model, at least one out-of-vocabulary word model, and at least one noise model;
      
      determining whether one of the silence, out-of-vocabulary, or noise models provides the best match during the comparing step; and
      
      ignoring the best match if one of the silence, out-of-vocabulary, or noise models provides the best match.
  - 7. The method of claim 6, wherein the step of comparing the input signal to a silence model, at least one out-of-vocabulary word model, and at least one noise model includes comparing the input signal to a noise model that represents background babble.
  - 8. The method of claim 6, wherein the step of comparing the input signal to a silence model, at least one out-of-vocabulary word model, and at least one noise model includes comparing the input signal to a noise model that represents background car noise.
  - 9. The method of claim 1, wherein the step of supplying the input signal to the speech recognizer includes supplying to a simple connected word recognition grammar, the input signal in parallel with the acoustic target vocabulary model and the acoustic prompt model.

10. A method of suppressing speech recognition errors and improving word accuracy in a speech recognition system that enables a user of a communication device to interrupt an outgoing system voice prompt with user-generated command words that halt the outgoing voice prompt and initiate desired actions, said method comprising the steps of:
- generating an acoustic model of the outgoing system voice prompt, said acoustic prompt model mathematically representing the words of the outgoing system voice prompt;
  
  storing the acoustic prompt model in a speech recognizer;
  
  storing an acoustic target vocabulary model in the speech recognizer, said acoustic target vocabulary model including models of a plurality of user-generated command words;
  
  supplying an input signal to a comparer in the speech recognizer, said input signal including at least one user-generated command word and an echo from the outgoing system voice prompt, wherein at least one word of the outgoing system voice prompt is included in the echo in the input signal;
  
  comparing the input signal to the acoustic target vocabulary model and the acoustic prompt model to identify which model provides a best match for the input signal;
  
  ignoring the best match if the acoustic prompt model provides the best match;
  
  accepting the best match if the acoustic target vocabulary model provides the best match;
  
  supplying to an action table, any command word corresponding to the best match provided by the acoustic target vocabulary model;
  
  identifying from the action table, an action corresponding to the supplied command word;
  
  halting the outgoing system voice prompt; and
  
  initiating the identified action.

11. A speech recognizer for recognizing input command words while suppressing speech recognition errors, said speech recognizer comprising:
- means for receiving an input signal that comprises incoming user input speech and an echo from an outgoing system voice prompt, wherein at least one word of the outgoing system voice prompt is included in the echo received in the input signal;
  
  an acoustic vocabulary model that mathematically represents at least one command word;
  
  an acoustic prompt model that mathematically represents the words of the outgoing system voice prompt; and
  
  a comparer that receives the input signal and compares the input signal to the acoustic vocabulary model and to the acoustic prompt model to determine which model provides a best match for the input signal, said comparer accepting the best match if the acoustic target vocabulary model provides the best match, and ignoring the best match if the acoustic prompt model provides the best match.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
- - 12. The speech recognizer of claim 11, further comprising means for generating the acoustic prompt model from a known text.
  - 13. The speech recognizer of claim 11, further comprising means for generating the acoustic prompt model from the speech signal of the outgoing system voice prompt prior to playing the prompt.
  - 14. The speech recognizer of claim 11, further comprising means for generating the acoustic prompt model at an attenuation level of approximately 20 dB relative to the outgoing system voice prompt.
  - 15. The speech recognizer of claim 11, further comprising a silence model, at least one out-of-vocabulary word model, and at least one noise model connected to the comparer in parallel with the acoustic vocabulary model and the acoustic prompt model, wherein the comparer also determines whether the best match is provided by the silence model, the at least one out-of-vocabulary word model, or the at least one noise model, and if so, ignores the best match.
  - 16. The speech recognizer of claim 15, wherein the at least one noise model includes a noise model that represents background babble.
  - 17. The speech recognizer of claim 15, wherein the at least one noise model includes a noise model that represents background car noise.
  - 18. The speech recognizer of claim 11, wherein the comparer includes a comparison function selected from a group consisting of:
    - an arbitrary grammar;
      
      a simple connected word recognition grammar; and
      
      a language model.

19. A speech recognition system for suppressing speech recognition errors and improving word accuracy, said system enabling a user of a communication device to interrupt an outgoing system voice prompt with user-generated command words that halt the outgoing system voice prompt and initiate desired actions, said system comprising:
- means for generating an acoustic model of the outgoing system voice prompt, said acoustic prompt model mathematically representing the words of the outgoing system voice prompt;
  
  an acoustic vocabulary model comprising mathematical models of a plurality of user-generated command words;
  
  a comparer for receiving an input signal and comparing the input signal to the acoustic vocabulary model and to the acoustic prompt model to determine which model provides a best match for the input signal, said input signal including at least one user-generated command word and an echo from the outgoing system voice prompt, wherein at least one word of the outgoing system voice prompt is included in the echo in the input signal, said comparer accepting the best match if the acoustic target vocabulary model provides the best match, and ignoring the best match if the acoustic prompt model provides the best match; and
  
  an action table that receives a command word from the comparer upon a determination by the comparer that the acoustic target vocabulary model provides the best match, said action table associating the received command word with a corresponding action, and notifying an associated network to initiate the corresponding action, and to halt the outgoing system voice prompt.
- View Dependent Claims (20, 21, 22)
- - 20. The speech recognition system of claim 19, wherein the means for generating the acoustic prompt model includes means for generating the acoustic prompt model from a known text.
  - 21. The speech recognition system of claim 19, wherein the means for generating the acoustic prompt model includes means for generating the acoustic prompt model from the speech signal of the outgoing system voice prompt prior to playing the prompt.
  - 22. The speech recognition system of claim 19, wherein the means for generating the acoustic prompt model includes means for generating the acoustic prompt model at an attenuation level of approximately 20 dB relative to the outgoing system voice prompt.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Telefonaktiebolaget LM Ericsson
Original Assignee
Telefonaktiebolaget LM Ericsson
Inventors
Junkawitsch, Jochen, Reinhard, Klaus, Bruckner, Raymond, Dobler, Stefan
Primary Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US10/631,985
Publication Number

US 20050027527A1
Time in Patent Office

1,790 Days
Field of Search

704/251
US Class Current

704/251
CPC Class Codes

G10L 15/20   Speech recognition techniqu...

G10L 15/22   Procedures used during a sp...

G10L 2021/02082   the noise being echo, rever...

System and method enabling acoustic barge-in

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

System and method enabling acoustic barge-in

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links