Method for Context Driven Speech Recognition and Processing
First Claim
1. A method for electronically recognizing and processing speech comprising:
- creating a first set of grammar rules;
loading the first set of grammar rules into a speech recognizer;
receiving a first transmitted audio stream;
running a language script in the speech recognizer;
comparing language in the first transmitted audio stream to language in the first set of grammar rules to determine whether the language in the first transmitted audio stream matches language in the first set of grammar rules;
producing a textual representation of the language of the first transmitted audio stream, using language of one of the grammar rules of the first set of grammar rules, to create consumable data when a match between the language of the first transmitted audio stream and the language of one of the first set of grammar rules is found;
transmitting the consumable data to a processor;
determining which grammar rule has language that most likely matches the language of the first transmitted audio stream, when multiple possible matches are found;
producing a textual representation of the language of the first transmitted audio stream using language of a best matched grammar rule of the first set of grammar rules to create consumable data;
transmitting the consumable data to the processor;
creating a subsequent set of grammar rules when no match is found between the language of the first transmitted audio stream and the language of the first set of grammar rules;
repeating the loading, comparing and determining steps with language of the subsequent set of grammar rules and language of the first transmitted audio stream until a match is found;
producing a textual representation of the language of the first transmitted audio stream using language of a best matched grammar rule of the subsequent set of grammar rules, to create consumable data;
transmitting the consumable data to the processor;
creating a separate set of grammar rules; and
repeating the method for electronically recognizing and processing speech for a second transmitted audio stream based on the consumable data transmitted from the first transmitted audio stream.
1 Assignment
0 Petitions
Accused Products
Abstract
The invention is system and method to recognize speech vocalizations using context-specific grammars and vocabularies. The system and method allow increased accuracy of recognized utterances by eliminating all language encodings irrelevant to the current context and allowing identification of appropriate context transitions. The system and method creates a context dependent speech recognition system with multiple supported contexts, each with specific grammar and vocabulary, and each identifying the potential context transition allowed. The system and method also include programmatic integration between the context dependent speech recognition system and other systems to make use of the recognized speech.
11 Citations
11 Claims
-
1. A method for electronically recognizing and processing speech comprising:
-
creating a first set of grammar rules; loading the first set of grammar rules into a speech recognizer; receiving a first transmitted audio stream; running a language script in the speech recognizer; comparing language in the first transmitted audio stream to language in the first set of grammar rules to determine whether the language in the first transmitted audio stream matches language in the first set of grammar rules; producing a textual representation of the language of the first transmitted audio stream, using language of one of the grammar rules of the first set of grammar rules, to create consumable data when a match between the language of the first transmitted audio stream and the language of one of the first set of grammar rules is found; transmitting the consumable data to a processor; determining which grammar rule has language that most likely matches the language of the first transmitted audio stream, when multiple possible matches are found; producing a textual representation of the language of the first transmitted audio stream using language of a best matched grammar rule of the first set of grammar rules to create consumable data; transmitting the consumable data to the processor; creating a subsequent set of grammar rules when no match is found between the language of the first transmitted audio stream and the language of the first set of grammar rules; repeating the loading, comparing and determining steps with language of the subsequent set of grammar rules and language of the first transmitted audio stream until a match is found; producing a textual representation of the language of the first transmitted audio stream using language of a best matched grammar rule of the subsequent set of grammar rules, to create consumable data; transmitting the consumable data to the processor; creating a separate set of grammar rules; and repeating the method for electronically recognizing and processing speech for a second transmitted audio stream based on the consumable data transmitted from the first transmitted audio stream. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for context driven speech recognition comprising:
-
a processor; a user interface electronically connected to the processor; a voice input device electronically connected to the user interface; a speech recognizer electronically connected to the processor; a memory electronically connected the processor; a grammar function stored in the memory; an application framework stored in the memory; a logic function stored in the memory; and a configuration interface stored the memory, wherein a first set of grammar rules, stored in the memory, is loaded into the speech recognizer; the user interface receives a first transmitted audio stream; the speech recognizer runs a language script to compare language in the first transmitted audio stream to language in the first set of grammar rules to determine whether the language in the first transmitted audio stream matches language in the first set of grammar rules; the logic function produces a textual representation of the language of the first transmitted audio stream, using language of one of the grammar rules of the first set of grammar rules, to create consumable data when a match between the language of the first transmitted audio stream and the language of one of the first set of grammar rules is found, and transmits the consumable data to the processor; the processor determines which grammar rule of the first set of grammar rules has language that best matches the language of the first transmitted audio stream, when multiple matches are found; the application framework produces a textual representation of the language of the first transmitted audio stream using language of the best matched grammar rule of the first set of grammar rules to create consumable data and transmits the consumable data to the processor; the logic function creates a subsequent set of grammar rules when no match is found between the language of the first transmitted audio stream and the language of the first set of grammar rules.
-
Specification