Continuous speech recognition and voice response system and method to enable conversational dialogues with microprocessors

US 5,615,296 A
Filed: 11/12/1993
Issued: 03/25/1997
Est. Priority Date: 11/12/1993
Status: Expired due to Term

First Claim

Patent Images

1. In a data processing system, a method to playback context related voice response signals and display prompts, comprising the steps of:

storing in a first partition of a memory, a first plurality of words having a first context, each of said first plurality of words including a second plurality of phonemes;

storing in said first partition a third plurality of voice response words having said first context;

storing in a second partition of said memory, a fifth plurality of words having a second context, each of said fifth plurality of words including a sixth plurality of phonemes;

storing in said second partition a seventh plurality of voice response words having said second context;

storing in a third partition of said memory, a ninth plurality of phoneme pattern matching data units for sharing by both contexts;

storing a first pointer map in said memory including a first plurality of pointers, each of which points to a respective one of said second plurality of phonemes and to a respective one of said ninth plurality of pattern matching data units;

storing a second pointer map in said memory including a second plurality of pointers, each of which points to a respective one of said sixth plurality of phonemes and to a respective one of said ninth plurality of pattern matching data units;

selecting said first context and said first pointer map in response to a first input;

recognizing a first spoken utterance containing one of said first plurality of words and in response thereto, outputting one of said third plurality of voice response words;

recognizing said first spoken utterance and in response thereto, outputting one of said fourth plurality of display prompts;

instantaneously selecting said second context and said second pointer map in response to a second input without loading new pattern matching data units in said memory;

recognizing a second spoken utterance containing one of said fifth plurality of words and in response thereto, outputting one of said seventh plurality of voice response words; and

recognizing said second spoken utterance and in response thereto, outputting one of said eighth plurality of display prompts.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A continuous speech recognition and voice response system provides a natural sounding and effective interactive, speech-driven dialogue from a data processing system. A concatenation of words into phrases and sentences improves recognition and mimic natural language processing. The system uses speaker-independent, continuous-speech to initiate the playback of audio files. The system employs high-speed context switching to modify the active vocabulary and applies high-speed context switching to modify or activate Audio WAV voice response files. The system uses dialogue history to activate selected context, Baukus-Naur Form (BNF) grammars and WAV files and provides phrase or sentence long dialogue prompts to improve accuracy. The system also provides audio prompts to improve accuracy and provides speech-activated buttons to navigate between menus.

350 Citations

40 Claims

1. In a data processing system, a method to playback context related voice response signals and display prompts, comprising the steps of:
- storing in a first partition of a memory, a first plurality of words having a first context, each of said first plurality of words including a second plurality of phonemes;
  
  storing in said first partition a third plurality of voice response words having said first context;
  
  storing in a second partition of said memory, a fifth plurality of words having a second context, each of said fifth plurality of words including a sixth plurality of phonemes;
  
  storing in said second partition a seventh plurality of voice response words having said second context;
  
  storing in a third partition of said memory, a ninth plurality of phoneme pattern matching data units for sharing by both contexts;
  
  storing a first pointer map in said memory including a first plurality of pointers, each of which points to a respective one of said second plurality of phonemes and to a respective one of said ninth plurality of pattern matching data units;
  
  storing a second pointer map in said memory including a second plurality of pointers, each of which points to a respective one of said sixth plurality of phonemes and to a respective one of said ninth plurality of pattern matching data units;
  
  selecting said first context and said first pointer map in response to a first input;
  
  recognizing a first spoken utterance containing one of said first plurality of words and in response thereto, outputting one of said third plurality of voice response words;
  
  recognizing said first spoken utterance and in response thereto, outputting one of said fourth plurality of display prompts;
  
  instantaneously selecting said second context and said second pointer map in response to a second input without loading new pattern matching data units in said memory;
  
  recognizing a second spoken utterance containing one of said fifth plurality of words and in response thereto, outputting one of said seventh plurality of voice response words; and
  
  recognizing said second spoken utterance and in response thereto, outputting one of said eighth plurality of display prompts.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein said second input is a context switching voice utterance.
  - 3. The method of claim 2, wherein said one of said third plurality of voice response words is a voice prompt to prompt a user to speak said context switching voice utterance.
  - 4. The method of claim 1, wherein said first input is a second occurring voice utterance that is combined with a first occurring voice utterance in a dialogue history.
  - 5. The method of claim 1, wherein said first input is a voice utterance that activates a speech-activated display button.
  - 6. The method of claim 1, wherein said recognizing steps include concatenation of words into phrases.
  - 7. The method of claim 1, wherein said recognizing steps include concatenation of words into sentences.
  - 8. The method of claim 1, wherein said recognizing steps include recognition of speaker-independent, continuous speech.
  - 9. The method of claim 2, wherein said one of said fourth plurality of display prompts direct a user to speak said context switching voice utterance.

10. In a data processing system, a method to playback context related voice response signals, comprising the steps of:
- storing in a first partition of a memory, a first plurality of words having a first context, each of said first plurality of words including a second plurality of phonemes;
  
  storing in said first partition a third plurality of voice response words having said first context;
  
  storing in a second partition of said memory, a fifth plurality of words having a second context, each of said fifth plurality of words including a sixth plurality of phonemes;
  
  storing in said second partition a seventh plurality of voice response words having said second context;
  
  storing in a third partition of said memory, a ninth plurality of phoneme pattern matching data units for sharing by both contexts;
  
  storing a first pointer map in said memory including a first plurality of pointers, each of which points to a respective one of said second plurality of phonemes and to a respective one of said ninth plurality of pattern matching data units;
  
  selecting said first context and said first pointer map in response to a first input;
  
  recognizing a first spoken utterance containing one of said first plurality of words using speaker-independent, continuous speech recognition, and in response thereto, outputting one of said third plurality of voice response words;
  
  instantaneously selecting said second context and said second pointer map in response to a second input without loading new pattern matching data units in said memory; and
  
  recognizing a second spoken utterance containing one of said fifth plurality of words using speaker-independent, continuous speech recognition, and in response thereto, outputting one of said seventh plurality of voice response words.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The method of claim 10, wherein said second input is a context switching voice utterance.
  - 12. The method of claim 11, wherein said one of said third plurality of voice response words is a voice prompt to prompt a user to speak said context switching voice utterance.
  - 13. The method of claim 10, wherein said first input is a second occurring voice utterance that is combined with a first occurring voice utterance in a dialogue history.
  - 14. The method of claim 10, wherein said first input is a voice utterance that activates a speech-activated display button.
  - 15. The method of claim 10, wherein said recognizing steps include concatenation of words into phrases.
  - 16. The method of claim 10, wherein said recognizing steps include concatenation of words into sentences.

17. In a data processing system, a method to playback context related voice response signals, comprising the steps of:
- storing in a memory, a first plurality of words having a first context, each of said first plurality of words including a second plurality of phonemes;
  
  storing in said memory a third plurality of voice response words having said first context;
  
  storing in said memory, a fifth plurality of words having a second context, each of said fifth plurality of words including a sixth plurality of phonemes;
  
  storing in said memory a seventh plurality of voice response words having said second context;
  
  storing in said memory, a ninth plurality of phoneme pattern matching data units for sharing by both contexts;
  
  storing a first pointer map in said memory including a first plurality of pointers, each of which points to a respective one of said second plurality of phonemes and to a respective one of said ninth plurality of pattern matching data units;
  
  storing a second pointer map in said memory including a second plurality of pointers, each of which points to a respective one of said sixth plurality of phonemes and to a respective one of said ninth plurality of pattern matching data units;
  
  selecting said first context and said first pointer map in response to a first input;
  
  recognizing a first spoken utterance containing one of said first plurality of words using speaker-independent, continuous speech recognition, and in response thereto, outputting one of said third plurality of voice response words;
  
  instantaneously selecting said second context and said second pointer map in response to a second input without loading new pattern matching data units in said memory; and
  
  recognizing a second spoken utterance containing one of said fifth plurality of words using speaker-independent, continuous speech recognition, and in response thereto, outputting one of said seventh plurality of voice response words.
- View Dependent Claims (18, 19, 20, 21, 22, 23)
- - 18. The method of claim 17, wherein said second input is a context switching voice utterance.
  - 19. The method of claim 18, wherein said one of said third plurality of voice response words is a voice prompt to prompt a user to speak said context switching voice utterance.
  - 20. The method of claim 17, wherein said first input is a second occurring voice utterance that is combined with a first occurring voice utterance in a dialogue history.
  - 21. The method of claim 17, wherein said first input is a voice utterance that activates a speech-activated display button.
  - 22. The method of claim 17, wherein said recognizing steps include concatenation of words into phrases.
  - 23. The method of claim 17, wherein said recognizing steps include concatenation of words into sentences.

24. A data processing system to playback context related voice response signals, comprising:
- means for storing in a memory, a first plurality of words having a first context, each of said first plurality of words including a second plurality of phonemes;
  
  means for storing in said memory a third plurality of voice response words having said first context;
  
  means for storing in said memory, a fifth plurality of words having a second context, each of said fifth plurality of words including a sixth plurality of phonemes;
  
  means for storing in said memory a seventh plurality of voice response words having said second context;
  
  means for storing in said memory, a ninth plurality of phoneme pattern matching data units for sharing by both contexts;
  
  means for storing a first pointer map in said memory including a first plurality of pointers, each of which points to a respective one of said second plurality of phonemes and to a respective one of said ninth plurality of pattern matching data units;
  
  means for storing a second pointer map in said memory including a second plurality of pointers, each of which points to a respective one of said sixth plurality of phonemes to a respective one of said ninth plurality of pattern matching data units;
  
  means for selecting said first context and first pointer map in response to a first input;
  
  means for recognizing a first spoken utterance containing one of said first plurality of words using speaker-independent, continuous speech recognition, and in response thereto, outputting one of said third plurality of voice response words;
  
  means for selecting said second pointer map in response to a second input without loading new pattern matching data units in said memory; and
  
  means for recognizing a second spoken utterance containing one of said fifth plurality of words using speaker-independent, continuous speech recognition, and in response thereto, outputting one of said seventh plurality of voice response words.

25. A data processing system to playback context related voice response signals and display prompts, comprising:
- means for storing in a first partition of a memory, a first plurality of words having a first context, each of said first plurality of words including a second plurality of phonemes;
  
  means for storing in said first partition a third plurality of voice response words having said first context;
  
  means for storing in said first partition a fourth plurality of display prompts having said first context;
  
  means for storing in a second partition of said memory, a fifth plurality of words having a second context, each of said fifth plurality of words including a sixth plurality of phonemes;
  
  means for storing in said second partition a seventh plurality of voice response words having said second context;
  
  means for storing in said second partition an eighth plurality of display prompts having said first context;
  
  means for storing in a third partition of said memory, a ninth plurality of phoneme pattern matching data units for sharing by both contexts;
  
  means for storing a first pointer map in said memory including a first plurality of pointers, each of which points to a respective one of said second plurality of phonemes and to a respective one of said ninth plurality of pattern matching data units;
  
  means for selecting said first context and said first pointer map in response to a first input;
  
  means for recognizing a first spoken utterance containing one of said first plurality of words and in response thereto, outputting ne of said third plurality of voice response words;
  
  means for recognizing said first spoken utterance and in response thereto, outputting one of said fourth plurality of display prompts;
  
  means for selecting said second context and said second pointer map in response to a second input without loading new pattern matching data units in said memory;
  
  means for recognizing a second spoken utterance containing one of said fifth plurality of words and in response thereto, outputting one of said seventh plurality of voice response words; and
  
  means for recognizing said second spoken utterance and in response thereto, outputting one of said eight plurality of display prompts.

26. In a data processing system to playback context related voice response signals and display prompts, a speech recognition system comprising:
- speech input means for generating a series of vector quantization values (VQ) indicative of speech input;
  
  means for generating multiple word grammars as contexts, each context related to a user application;
  
  a speech recognition unit for matching word sequences in a context to the series of VQ values in the speech inputmeans coupling the speech input means, a processor, and an output device to a memory;
  
  a plurality of the contexts stored in the memory, each context containing a plurality of words represented by phonemes;
  
  a plurality of voice response words stored in the memory for each context;
  
  a plurality of voice prompts stored in the memory for each plurality of voice response words related to a context;
  
  a phoneme pattern matching unit stored in the memory for sharing by all contexts in the speech recognition unit;
  
  a plurality of pointer maps stored in the memory, each pointer map coupled to a different context, each pointer in a map connecting a phoneme in a word to a phoneme in the pattern matching unit;
  
  stored program instructions in the memory for operating the processor to match the VQ values in the speech input to the words stored in the context and generating character strings representative of the speech input as an output in the output device;
  
  means for instantaneously switching the speech recognition unit from one context to another context without changing the phoneme pattern matching units;
  
  means for recognizing a first spoken utterance containing one of a plurality of words related to a first context and in response thereto, outputting one of said voice response words related to said first context;
  
  means for recognizing said first spoken utterance and in response thereto, outputting one of said display prompts related to said outputted voice response word;
  
  means for selecting said second context and said second pointer map in response to a second input;
  
  means for recognizing a second spoken utterance containing one of a plurality of words related to a second context and in response thereto, outputting one of a plurality of voice response words included in said second context; and
  
  means for recognizing said second spoken utterance and in response thereto, outputting one of said display prompts related to said voice response word related to the second context.
- View Dependent Claims (27, 28, 29, 30, 31, 33, 40)
- - 27. The system of claim 26 further comprising a communications adapter for coupling the system to multiple stations in a network, each station including the speech recognition unit.
  - 28. The system of claim 27 wherein the phonemes are Hidden Markov Models.
  - 29. The system of claim 28 further including a recognition server to enable a user to request the services of a speech recognition unit for an application running on the processor or running on a different processor on the network.
  - 30. The system of claim 29 including means for continuous speech independent recognition.
  - 31. The system of claim 30 including means to reduce the storage capacity of the memory.
  - 33. The method of claim 31 wherein each phoneme is represented by a triphone.
  - 40. The system of claim 26 further including means for combining spoken utterances related to a context in a dialogue history.

32. In a data processing system to playback context related voice response signals and display prompts, a continuous speech recognition system for recognizing input speech in different contexts, comprising a front end for receiving the speech related to the contexts, a speech recognition unit including a processor and a memory for generating character strings representative of the speech input, a method of speech recognition comprising the steps of:
- storing in the memory a plurality of contexts of words, each word represented by at least one phoneme;
  
  storing in the memory a plurality of voice prompts for each plurality of voice response words related to a context;
  
  storing in the memory a plurality of voice prompts for each plurality of voice response words related to a context;
  
  storing in the memory a plurality of phoneme pattern matching units for sharing by all contexts;
  
  storing in the memory pointer maps, each pointer map connecting phonemes in the words to phonemes in the phoneme pattern matching units;
  
  storing in the memory program instructions for operating the processor;
  
  operating the processor using the stored program instruction to match the speech input to the words in the context using the shared phoneme pattern matching units;
  
  generating character strings as an output from the system from the matched speech input and words in the context;
  
  instantaneously switching the system to another context stored in the memory without changing the phoneme pattern matching units.recognizing a first spoken utterance containing one of a plurality of words related to a first context and in response thereto, outputting one of said voice response words related to said first context;
  
  recognizing said first spoken utterance and in response thereto, outputting one of said display prompts related to said outputted voice response word;
  
  recognizing a second spoken utterance containing one of a plurality of words related to a second context and in response thereto, outputting one of a plurality of voice response words related to said second context; and
  
  recognizing said second spoken utterance and in response thereto, outputting one of said display prompts related to said voice response word related to the said second context.
- View Dependent Claims (34, 35, 36, 37, 38, 39)
- - 34. The method of claim 32 further comprising the step of:
    - generating a series of vector quantization values representative of the speech input.
  - 35. The method of claim 34 further comprising the step of:
    - operating a selection device to instantaneously switch the system from one context to another.
  - 36. The method of claim 35 further comprising the step of;
    - storing at least one user application in a recognition server for a plurality of speech recognition units in a network to enable the applications to be recognized concurrently in the network.
  - 37. The method of claim 36 wherein the speech recognition unit is speaker independent.
  - 38. The method of claim 36 wherein the different contexts form a vocabulary of words in which certain words are anticipated by the speech recognition unit.
  - 39. The method of claim 32 wherein said the second spoken utterance is combined with the first spoken utterance in a dialogue history.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Stanford, Vincent M., Sherwin, Elton B. Jr., Castellucci, Frank V., Williamson, Ora J.
Primary Examiner(s)
Sheikh, Ayaz R.
Assistant Examiner(s)
CHOWDHURY, INDRINAL

Application Number

US08/152,654
Time in Patent Office

1,229 Days
Field of Search

395/2.1, 395/2.8, 395/2.81, 395/2.84
US Class Current

704/270.1
CPC Class Codes

G10L 15/18   using natural language mode...

G10L 15/22   Procedures used during a sp...

G10L 2015/022   Demisyllables, biphones or ...

G10L 2015/223   Execution procedure of a sp...

Continuous speech recognition and voice response system and method to enable conversational dialogues with microprocessors

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

350 Citations

40 Claims

Specification

Solutions

Use Cases

Quick Links

Continuous speech recognition and voice response system and method to enable conversational dialogues with microprocessors

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

350 Citations

40 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links