Method for Context Driven Speech Recognition and Processing

US 20160210968A1
Filed: 01/16/2015
Published: 07/21/2016
Est. Priority Date: 01/16/2015
Status: Active Grant

First Claim

Patent Images

1. A method for electronically recognizing and processing speech comprising:

creating a first set of grammar rules;

loading the first set of grammar rules into a speech recognizer;

receiving a first transmitted audio stream;

running a language script in the speech recognizer;

comparing language in the first transmitted audio stream to language in the first set of grammar rules to determine whether the language in the first transmitted audio stream matches language in the first set of grammar rules;

producing a textual representation of the language of the first transmitted audio stream, using language of one of the grammar rules of the first set of grammar rules, to create consumable data when a match between the language of the first transmitted audio stream and the language of one of the first set of grammar rules is found;

transmitting the consumable data to a processor;

determining which grammar rule has language that most likely matches the language of the first transmitted audio stream, when multiple possible matches are found;

producing a textual representation of the language of the first transmitted audio stream using language of a best matched grammar rule of the first set of grammar rules to create consumable data;

transmitting the consumable data to the processor;

creating a subsequent set of grammar rules when no match is found between the language of the first transmitted audio stream and the language of the first set of grammar rules;

repeating the loading, comparing and determining steps with language of the subsequent set of grammar rules and language of the first transmitted audio stream until a match is found;

producing a textual representation of the language of the first transmitted audio stream using language of a best matched grammar rule of the subsequent set of grammar rules, to create consumable data;

transmitting the consumable data to the processor;

creating a separate set of grammar rules; and

repeating the method for electronically recognizing and processing speech for a second transmitted audio stream based on the consumable data transmitted from the first transmitted audio stream.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention is system and method to recognize speech vocalizations using context-specific grammars and vocabularies. The system and method allow increased accuracy of recognized utterances by eliminating all language encodings irrelevant to the current context and allowing identification of appropriate context transitions. The system and method creates a context dependent speech recognition system with multiple supported contexts, each with specific grammar and vocabulary, and each identifying the potential context transition allowed. The system and method also include programmatic integration between the context dependent speech recognition system and other systems to make use of the recognized speech.

11 Citations

View as Search Results

11 Claims

1. A method for electronically recognizing and processing speech comprising:
- creating a first set of grammar rules;
  
  loading the first set of grammar rules into a speech recognizer;
  
  receiving a first transmitted audio stream;
  
  running a language script in the speech recognizer;
  
  comparing language in the first transmitted audio stream to language in the first set of grammar rules to determine whether the language in the first transmitted audio stream matches language in the first set of grammar rules;
  
  producing a textual representation of the language of the first transmitted audio stream, using language of one of the grammar rules of the first set of grammar rules, to create consumable data when a match between the language of the first transmitted audio stream and the language of one of the first set of grammar rules is found;
  
  transmitting the consumable data to a processor;
  
  determining which grammar rule has language that most likely matches the language of the first transmitted audio stream, when multiple possible matches are found;
  
  producing a textual representation of the language of the first transmitted audio stream using language of a best matched grammar rule of the first set of grammar rules to create consumable data;
  
  transmitting the consumable data to the processor;
  
  creating a subsequent set of grammar rules when no match is found between the language of the first transmitted audio stream and the language of the first set of grammar rules;
  
  repeating the loading, comparing and determining steps with language of the subsequent set of grammar rules and language of the first transmitted audio stream until a match is found;
  
  producing a textual representation of the language of the first transmitted audio stream using language of a best matched grammar rule of the subsequent set of grammar rules, to create consumable data;
  
  transmitting the consumable data to the processor;
  
  creating a separate set of grammar rules; and
  
  repeating the method for electronically recognizing and processing speech for a second transmitted audio stream based on the consumable data transmitted from the first transmitted audio stream.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein choosing the first grammar rule is based a prior knowledge of language most likely to be used in the first transmitted audio stream.
  - 3. The method of claim 1, wherein determining which grammar rule has language that most likely matches the language of the first transmitted audio stream, when multiple possible matches are found is accomplished by examining grammar weights and confidence scores produced by the speech recognizer.
  - 4. The method of claim 3, wherein the grammar weights are defined apriori in the first set of grammar rules.
  - 5. The method of claim 1, wherein the grammar rules define words that are expected to be contained in vocal utterances.
  - 6. The method of claim 1, wherein the transmitted audio streams are spoken language.
  - 7. The method of claim 1, wherein creating a subsequent set of grammar rules is based on audio stream language most likely to be used next based on a context of previously transmitted audio streams.
  - 8. The method of claim 1, wherein the grammar rules are written using a grammar specification language such as grXML or the like.
  - 9. The method of claim 1, wherein the first and subsequent sets of grammar rules are subsets of an entire grammar.
  - 10. The method of claim 1, wherein size and number the first and subsequent sets of grammar rules size and number are limited by context of language in the transmitted audio streams.

11. A system for context driven speech recognition comprising:
- a processor;
  
  a user interface electronically connected to the processor;
  
  a voice input device electronically connected to the user interface;
  
  a speech recognizer electronically connected to the processor;
  
  a memory electronically connected the processor;
  
  a grammar function stored in the memory;
  
  an application framework stored in the memory;
  
  a logic function stored in the memory; and
  
  a configuration interface stored the memory, whereina first set of grammar rules, stored in the memory, is loaded into the speech recognizer;
  
  the user interface receives a first transmitted audio stream;
  
  the speech recognizer runs a language script to compare language in the first transmitted audio stream to language in the first set of grammar rules to determine whether the language in the first transmitted audio stream matches language in the first set of grammar rules;
  
  the logic function produces a textual representation of the language of the first transmitted audio stream, using language of one of the grammar rules of the first set of grammar rules, to create consumable data when a match between the language of the first transmitted audio stream and the language of one of the first set of grammar rules is found, and transmits the consumable data to the processor;
  
  the processor determines which grammar rule of the first set of grammar rules has language that best matches the language of the first transmitted audio stream, when multiple matches are found;
  
  the application framework produces a textual representation of the language of the first transmitted audio stream using language of the best matched grammar rule of the first set of grammar rules to create consumable data and transmits the consumable data to the processor;
  
  the logic function creates a subsequent set of grammar rules when no match is found between the language of the first transmitted audio stream and the language of the first set of grammar rules.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
US Department of The Navy (U.S. Department Of Defense)
Original Assignee
the united states of america as represented by the secretary of the navy
Inventors
Ouakil, Lisa, Ouakil, Abdelhamid, Smith, Peter, Mouri, Ouns

Granted Patent

US 9,460,721 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 40/253   Grammatical analysis; Style...

G06F 40/279   Recognition of textual enti...

G06F 40/30   Semantic analysis

G10L 15/183   using context dependencies,...

G10L 2015/228   of application context

Method for Context Driven Speech Recognition and Processing

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

11 Citations

11 Claims

Specification

Use Cases

Quick Links

Others

Method for Context Driven Speech Recognition and Processing

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

11 Citations

11 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others