Continuous speech recognition and voice response system and method to enable conversational dialogues with microprocessors
First Claim
1. In a data processing system, a method to playback context related voice response signals and display prompts, comprising the steps of:
- storing in a first partition of a memory, a first plurality of words having a first context, each of said first plurality of words including a second plurality of phonemes;
storing in said first partition a third plurality of voice response words having said first context;
storing in a second partition of said memory, a fifth plurality of words having a second context, each of said fifth plurality of words including a sixth plurality of phonemes;
storing in said second partition a seventh plurality of voice response words having said second context;
storing in a third partition of said memory, a ninth plurality of phoneme pattern matching data units for sharing by both contexts;
storing a first pointer map in said memory including a first plurality of pointers, each of which points to a respective one of said second plurality of phonemes and to a respective one of said ninth plurality of pattern matching data units;
storing a second pointer map in said memory including a second plurality of pointers, each of which points to a respective one of said sixth plurality of phonemes and to a respective one of said ninth plurality of pattern matching data units;
selecting said first context and said first pointer map in response to a first input;
recognizing a first spoken utterance containing one of said first plurality of words and in response thereto, outputting one of said third plurality of voice response words;
recognizing said first spoken utterance and in response thereto, outputting one of said fourth plurality of display prompts;
instantaneously selecting said second context and said second pointer map in response to a second input without loading new pattern matching data units in said memory;
recognizing a second spoken utterance containing one of said fifth plurality of words and in response thereto, outputting one of said seventh plurality of voice response words; and
recognizing said second spoken utterance and in response thereto, outputting one of said eighth plurality of display prompts.
3 Assignments
0 Petitions
Accused Products
Abstract
A continuous speech recognition and voice response system provides a natural sounding and effective interactive, speech-driven dialogue from a data processing system. A concatenation of words into phrases and sentences improves recognition and mimic natural language processing. The system uses speaker-independent, continuous-speech to initiate the playback of audio files. The system employs high-speed context switching to modify the active vocabulary and applies high-speed context switching to modify or activate Audio WAV voice response files. The system uses dialogue history to activate selected context, Baukus-Naur Form (BNF) grammars and WAV files and provides phrase or sentence long dialogue prompts to improve accuracy. The system also provides audio prompts to improve accuracy and provides speech-activated buttons to navigate between menus.
350 Citations
40 Claims
-
1. In a data processing system, a method to playback context related voice response signals and display prompts, comprising the steps of:
-
storing in a first partition of a memory, a first plurality of words having a first context, each of said first plurality of words including a second plurality of phonemes; storing in said first partition a third plurality of voice response words having said first context; storing in a second partition of said memory, a fifth plurality of words having a second context, each of said fifth plurality of words including a sixth plurality of phonemes; storing in said second partition a seventh plurality of voice response words having said second context; storing in a third partition of said memory, a ninth plurality of phoneme pattern matching data units for sharing by both contexts; storing a first pointer map in said memory including a first plurality of pointers, each of which points to a respective one of said second plurality of phonemes and to a respective one of said ninth plurality of pattern matching data units; storing a second pointer map in said memory including a second plurality of pointers, each of which points to a respective one of said sixth plurality of phonemes and to a respective one of said ninth plurality of pattern matching data units; selecting said first context and said first pointer map in response to a first input; recognizing a first spoken utterance containing one of said first plurality of words and in response thereto, outputting one of said third plurality of voice response words; recognizing said first spoken utterance and in response thereto, outputting one of said fourth plurality of display prompts; instantaneously selecting said second context and said second pointer map in response to a second input without loading new pattern matching data units in said memory; recognizing a second spoken utterance containing one of said fifth plurality of words and in response thereto, outputting one of said seventh plurality of voice response words; and recognizing said second spoken utterance and in response thereto, outputting one of said eighth plurality of display prompts. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. In a data processing system, a method to playback context related voice response signals, comprising the steps of:
-
storing in a first partition of a memory, a first plurality of words having a first context, each of said first plurality of words including a second plurality of phonemes; storing in said first partition a third plurality of voice response words having said first context; storing in a second partition of said memory, a fifth plurality of words having a second context, each of said fifth plurality of words including a sixth plurality of phonemes; storing in said second partition a seventh plurality of voice response words having said second context; storing in a third partition of said memory, a ninth plurality of phoneme pattern matching data units for sharing by both contexts; storing a first pointer map in said memory including a first plurality of pointers, each of which points to a respective one of said second plurality of phonemes and to a respective one of said ninth plurality of pattern matching data units; selecting said first context and said first pointer map in response to a first input; recognizing a first spoken utterance containing one of said first plurality of words using speaker-independent, continuous speech recognition, and in response thereto, outputting one of said third plurality of voice response words; instantaneously selecting said second context and said second pointer map in response to a second input without loading new pattern matching data units in said memory; and recognizing a second spoken utterance containing one of said fifth plurality of words using speaker-independent, continuous speech recognition, and in response thereto, outputting one of said seventh plurality of voice response words. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. In a data processing system, a method to playback context related voice response signals, comprising the steps of:
-
storing in a memory, a first plurality of words having a first context, each of said first plurality of words including a second plurality of phonemes; storing in said memory a third plurality of voice response words having said first context; storing in said memory, a fifth plurality of words having a second context, each of said fifth plurality of words including a sixth plurality of phonemes; storing in said memory a seventh plurality of voice response words having said second context; storing in said memory, a ninth plurality of phoneme pattern matching data units for sharing by both contexts; storing a first pointer map in said memory including a first plurality of pointers, each of which points to a respective one of said second plurality of phonemes and to a respective one of said ninth plurality of pattern matching data units; storing a second pointer map in said memory including a second plurality of pointers, each of which points to a respective one of said sixth plurality of phonemes and to a respective one of said ninth plurality of pattern matching data units; selecting said first context and said first pointer map in response to a first input; recognizing a first spoken utterance containing one of said first plurality of words using speaker-independent, continuous speech recognition, and in response thereto, outputting one of said third plurality of voice response words; instantaneously selecting said second context and said second pointer map in response to a second input without loading new pattern matching data units in said memory; and recognizing a second spoken utterance containing one of said fifth plurality of words using speaker-independent, continuous speech recognition, and in response thereto, outputting one of said seventh plurality of voice response words. - View Dependent Claims (18, 19, 20, 21, 22, 23)
-
-
24. A data processing system to playback context related voice response signals, comprising:
-
means for storing in a memory, a first plurality of words having a first context, each of said first plurality of words including a second plurality of phonemes; means for storing in said memory a third plurality of voice response words having said first context; means for storing in said memory, a fifth plurality of words having a second context, each of said fifth plurality of words including a sixth plurality of phonemes; means for storing in said memory a seventh plurality of voice response words having said second context; means for storing in said memory, a ninth plurality of phoneme pattern matching data units for sharing by both contexts; means for storing a first pointer map in said memory including a first plurality of pointers, each of which points to a respective one of said second plurality of phonemes and to a respective one of said ninth plurality of pattern matching data units; means for storing a second pointer map in said memory including a second plurality of pointers, each of which points to a respective one of said sixth plurality of phonemes to a respective one of said ninth plurality of pattern matching data units; means for selecting said first context and first pointer map in response to a first input; means for recognizing a first spoken utterance containing one of said first plurality of words using speaker-independent, continuous speech recognition, and in response thereto, outputting one of said third plurality of voice response words; means for selecting said second pointer map in response to a second input without loading new pattern matching data units in said memory; and means for recognizing a second spoken utterance containing one of said fifth plurality of words using speaker-independent, continuous speech recognition, and in response thereto, outputting one of said seventh plurality of voice response words.
-
-
25. A data processing system to playback context related voice response signals and display prompts, comprising:
-
means for storing in a first partition of a memory, a first plurality of words having a first context, each of said first plurality of words including a second plurality of phonemes; means for storing in said first partition a third plurality of voice response words having said first context; means for storing in said first partition a fourth plurality of display prompts having said first context; means for storing in a second partition of said memory, a fifth plurality of words having a second context, each of said fifth plurality of words including a sixth plurality of phonemes; means for storing in said second partition a seventh plurality of voice response words having said second context; means for storing in said second partition an eighth plurality of display prompts having said first context; means for storing in a third partition of said memory, a ninth plurality of phoneme pattern matching data units for sharing by both contexts; means for storing a first pointer map in said memory including a first plurality of pointers, each of which points to a respective one of said second plurality of phonemes and to a respective one of said ninth plurality of pattern matching data units; means for selecting said first context and said first pointer map in response to a first input; means for recognizing a first spoken utterance containing one of said first plurality of words and in response thereto, outputting ne of said third plurality of voice response words; means for recognizing said first spoken utterance and in response thereto, outputting one of said fourth plurality of display prompts; means for selecting said second context and said second pointer map in response to a second input without loading new pattern matching data units in said memory; means for recognizing a second spoken utterance containing one of said fifth plurality of words and in response thereto, outputting one of said seventh plurality of voice response words; and means for recognizing said second spoken utterance and in response thereto, outputting one of said eight plurality of display prompts.
-
-
26. In a data processing system to playback context related voice response signals and display prompts, a speech recognition system comprising:
-
speech input means for generating a series of vector quantization values (VQ) indicative of speech input; means for generating multiple word grammars as contexts, each context related to a user application; a speech recognition unit for matching word sequences in a context to the series of VQ values in the speech input means coupling the speech input means, a processor, and an output device to a memory; a plurality of the contexts stored in the memory, each context containing a plurality of words represented by phonemes; a plurality of voice response words stored in the memory for each context; a plurality of voice prompts stored in the memory for each plurality of voice response words related to a context; a phoneme pattern matching unit stored in the memory for sharing by all contexts in the speech recognition unit; a plurality of pointer maps stored in the memory, each pointer map coupled to a different context, each pointer in a map connecting a phoneme in a word to a phoneme in the pattern matching unit; stored program instructions in the memory for operating the processor to match the VQ values in the speech input to the words stored in the context and generating character strings representative of the speech input as an output in the output device; means for instantaneously switching the speech recognition unit from one context to another context without changing the phoneme pattern matching units; means for recognizing a first spoken utterance containing one of a plurality of words related to a first context and in response thereto, outputting one of said voice response words related to said first context; means for recognizing said first spoken utterance and in response thereto, outputting one of said display prompts related to said outputted voice response word; means for selecting said second context and said second pointer map in response to a second input; means for recognizing a second spoken utterance containing one of a plurality of words related to a second context and in response thereto, outputting one of a plurality of voice response words included in said second context; and means for recognizing said second spoken utterance and in response thereto, outputting one of said display prompts related to said voice response word related to the second context. - View Dependent Claims (27, 28, 29, 30, 31, 33, 40)
-
-
32. In a data processing system to playback context related voice response signals and display prompts, a continuous speech recognition system for recognizing input speech in different contexts, comprising a front end for receiving the speech related to the contexts, a speech recognition unit including a processor and a memory for generating character strings representative of the speech input, a method of speech recognition comprising the steps of:
-
storing in the memory a plurality of contexts of words, each word represented by at least one phoneme; storing in the memory a plurality of voice prompts for each plurality of voice response words related to a context; storing in the memory a plurality of voice prompts for each plurality of voice response words related to a context; storing in the memory a plurality of phoneme pattern matching units for sharing by all contexts; storing in the memory pointer maps, each pointer map connecting phonemes in the words to phonemes in the phoneme pattern matching units; storing in the memory program instructions for operating the processor; operating the processor using the stored program instruction to match the speech input to the words in the context using the shared phoneme pattern matching units; generating character strings as an output from the system from the matched speech input and words in the context; instantaneously switching the system to another context stored in the memory without changing the phoneme pattern matching units. recognizing a first spoken utterance containing one of a plurality of words related to a first context and in response thereto, outputting one of said voice response words related to said first context; recognizing said first spoken utterance and in response thereto, outputting one of said display prompts related to said outputted voice response word; recognizing a second spoken utterance containing one of a plurality of words related to a second context and in response thereto, outputting one of a plurality of voice response words related to said second context; and recognizing said second spoken utterance and in response thereto, outputting one of said display prompts related to said voice response word related to the said second context. - View Dependent Claims (34, 35, 36, 37, 38, 39)
-
Specification