Method for context driven speech recognition and processing
First Claim
1. A method for electronically recognizing and processing speech comprising:
- creating a first set of grammar rules;
loading the first set of grammar rules into a speech recognizer;
receiving a first transmitted audio stream containing a first utterance of speech to be recognized;
running a language script in the speech recognizer;
comparing language in the first transmitted audio stream to language in the first set of grammar rules to determine whether the language in the first transmitted audio stream matches language in the first set of grammar rules;
producing a textual representation of the language of the first transmitted audio stream, using language of one of the grammar rules of the first set of grammar rules, to create consumable data when a match between the language of the first transmitted audio stream and the language of one of the first set of grammar rules is found;
transmitting the consumable data to a processor;
determining which grammar rule has language that most likely matches the language of the first transmitted audio stream, when multiple possible matches are found;
producing a textual representation of the language of the first transmitted audio stream using language of a best matched grammar rule of the first set of grammar rules to create consumable data;
transmitting the consumable data to the processor;
creating a subsequent set of grammar rules when no match is found between the language of the first transmitted audio stream and the language of the first set of grammar rules;
repeating the loading, comparing and determining steps with language of the subsequent set of grammar rules and language of the first transmitted audio stream until a match is found;
producing a textual representation of the language of the first transmitted audio stream using language of a best matched grammar rule of the subsequent set of grammar rules, to create consumable data;
transmitting the consumable data to the processor; and
creating a separate set of grammar rules to recognize and process a second transmitted audio stream containing a second utterance separate and distinct from the first utterance in the first audio stream where the separate set of grammar rules is based on the consumable data transmitted from the first transmitted audio stream and repeating the recognizing and processing speech steps, whereby, results of a current recognition event impacts future recognition events by producing consumable data from the loaded set of grammar rules to determine the next appropriate set of rules.
1 Assignment
0 Petitions
Accused Products
Abstract
The invention is system and method to recognize speech vocalizations using context-specific grammars and vocabularies. The system and method allow increased accuracy of recognized utterances by eliminating all language encodings irrelevant to the current context and allowing identification of appropriate context transitions. The system and method creates a context dependent speech recognition system with multiple supported contexts, each with specific grammar and vocabulary, and each identifying the potential context transition allowed. The system and method also include programmatic integration between the context dependent speech recognition system and other systems to make use of the recognized speech.
8 Citations
9 Claims
-
1. A method for electronically recognizing and processing speech comprising:
-
creating a first set of grammar rules; loading the first set of grammar rules into a speech recognizer; receiving a first transmitted audio stream containing a first utterance of speech to be recognized; running a language script in the speech recognizer; comparing language in the first transmitted audio stream to language in the first set of grammar rules to determine whether the language in the first transmitted audio stream matches language in the first set of grammar rules; producing a textual representation of the language of the first transmitted audio stream, using language of one of the grammar rules of the first set of grammar rules, to create consumable data when a match between the language of the first transmitted audio stream and the language of one of the first set of grammar rules is found; transmitting the consumable data to a processor; determining which grammar rule has language that most likely matches the language of the first transmitted audio stream, when multiple possible matches are found; producing a textual representation of the language of the first transmitted audio stream using language of a best matched grammar rule of the first set of grammar rules to create consumable data; transmitting the consumable data to the processor; creating a subsequent set of grammar rules when no match is found between the language of the first transmitted audio stream and the language of the first set of grammar rules; repeating the loading, comparing and determining steps with language of the subsequent set of grammar rules and language of the first transmitted audio stream until a match is found; producing a textual representation of the language of the first transmitted audio stream using language of a best matched grammar rule of the subsequent set of grammar rules, to create consumable data; transmitting the consumable data to the processor; and creating a separate set of grammar rules to recognize and process a second transmitted audio stream containing a second utterance separate and distinct from the first utterance in the first audio stream where the separate set of grammar rules is based on the consumable data transmitted from the first transmitted audio stream and repeating the recognizing and processing speech steps, whereby, results of a current recognition event impacts future recognition events by producing consumable data from the loaded set of grammar rules to determine the next appropriate set of rules. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for context driven speech recognition comprising:
-
a processor; a user interface electronically connected to the processor; a voice input device electronically connected to the user interface; a speech recognizer electronically connected too the processor; a memory electronically connected the processor; an application framework stored in the memory; a logic function stored in the memory; and a configuration, interface stored the memory, wherein a first set of grammar rules, stored in the memory, is loaded into the speech recognizer; the user interface receives a first transmitted audio stream containing a first utterance of speech to be recognized; the speech recognizer runs a language script to compare language in the first transmitted audio stream to language in the first set of grammar rules to determine whether the language in the first transmitted audio stream matches language in the first set of grammar rules; the logic function produces a textual representation of the language of the first transmitted audio stream, using language of one of the grammar rules of the first set of grammar rules, to create consumable data when a match between the language of the first transmitted audio stream and the language of one of the first set of grammar rules is found, and transmits the consumable data to the processor; the processor determines which grammar rule of the first set of grammar rules has language that best matches the language of the first transmitted audio stream, when multiple matches are found;
the application framework produces a textual representation of the language of the first transmitted audio stream using language of the best matched grammar rule of the first set of grammar rules to create consumable data and transmits the consumable data to the processor;the logic function creates a subsequent set of grammar rules when no match is found between the language of the first transmitted audio stream and the language of the first set of grammar rules to recognize and process a second transmitted audio stream containing a second utterance separate and distinct from the first utterance in the first audio stream where the separate set of grammar rules is based on the consumable data transmitted from the first transmitted audio stream and repeating the recognizing and processing speech steps, whereby, results of a current recognition event impacts future recognition events by producing consumable data from the loaded set of grammar rules to determine the next appropriate set of rules.
-
Specification