Dynamic language model for speech recognition
First Claim
1. A method of speech recognition in a speech recognition system comprising the following steps:
- a. determining acoustic features in a sound sample;
b. upon commencing said determination of said acoustic features, determining possible combinations of words which may be recognized by said speech recognition system and storing said possible combinations of words as a current language model, said current language model being generated from a plurality of speech rules each comprising a language model and an associated action, each said language model in each of said plurality of speech rules including a plurality of states, words defining transitions between said plurality of states, and terminal states;
c. upon the completion of said generation of said current language model, recognizing words comprising said acoustic features by traversing states in said current language model until reaching said terminal states in said current language model; and
d. subsequent to said step of recognizing words, determining a matched speech rule from said plurality of speech rules used to create said current language model and said words and performing said action associated with said matched speech rule.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of speech recognition which determines acoustic features in a sound sample; recognizes words comprising the acoustic features based on a language model, which determines the possible sequences of words that may be recognized; and the selection of an appropriate response based on the words recognized. Information about what words may be recognized, under which conditions those words may be recognized, and what response is appropriate when the words are recognized, is stored, in a preferred embodiment, in a data structure called a speech rule. These speech rules are partitioned according to the context in which they are active. When speech is detected, concurrent with acoustic feature extraction, the current state of the computer system is used to determine which rules are active and how they are to be combined in order to generate a language model for word recognition. A language model is dynamically generated and used to find the best interpretation of the acoustic features as a word sequence. This word sequence is then matched against active rules in order to determine the appropriate response. Rules that match all or part of the word sequence contribute data structures representing the "meaning" of the word sequence, and these data structures are used by the rule actions in order to generate an appropriate response to the spoken utterance.
-
Citations
7 Claims
-
1. A method of speech recognition in a speech recognition system comprising the following steps:
-
a. determining acoustic features in a sound sample; b. upon commencing said determination of said acoustic features, determining possible combinations of words which may be recognized by said speech recognition system and storing said possible combinations of words as a current language model, said current language model being generated from a plurality of speech rules each comprising a language model and an associated action, each said language model in each of said plurality of speech rules including a plurality of states, words defining transitions between said plurality of states, and terminal states; c. upon the completion of said generation of said current language model, recognizing words comprising said acoustic features by traversing states in said current language model until reaching said terminal states in said current language model; and d. subsequent to said step of recognizing words, determining a matched speech rule from said plurality of speech rules used to create said current language model and said words and performing said action associated with said matched speech rule. - View Dependent Claims (2, 3, 4)
-
-
5. A method of speech recognition in a speech recognition system comprising the following steps:
-
a. determining acoustic features in a sound sample; b. upon commencing said determination of said acoustic features, determining possible combinations of words which may be recognized by said speech recognition system based upon an operating context of said speech recognition system and storing said possible combinations of words as a current language model; and c. upon the completion of said generation of said current language model, providing said current language model to a recognizer which recognizes words comprising said acoustic features.
-
-
6. A method of speech recognition in a speech recognition system comprising the following steps:
-
a. determining acoustic features in a sound sample which may include human speech comprising sequences of words; b. upon commencing said determination of said acoustic features, determining possible combinations of words which may be recognized by said speech recognition system based upon a current operating context of said speech recognition system and storing said possible combinations of words as a current language model; c. upon the completion of said generation of said current language model, providing said current language model to a recognizer which recognizes words comprising said acoustic features; and d. interpreting said words and performing an action specified by said words which are received from said recognizer.
-
-
7. An apparatus for speech recognition in a speech recognition system comprising:
-
a. means for determining acoustic features in a sound sample which may include human speech comprising sequences of words; b. means commencing simultaneously With said determination means for determining possible combinations of words which may be recognized by said speech recognition system based upon a current operating context of said speech recognition system; c. means for storing said possible combinations of words as a current language model; d. means operative upon the completion of said generation and storing of said current language model for providing said current language model to a recognizer which recognizes words comprising said acoustic features; and e. means for interpreting said words and for performing actions specified by said words which are received from said recognizer.
-
Specification