ENHANCEMENT TO VITERBI SPEECH PROCESSING ALGORITHM FOR HYBRID SPEECH MODELS THAT CONSERVES MEMORY
First Claim
1. A speech processing method comprising:
- generating a speech space for a speech recognition decoder from an N-gram language model having N greater than two and including at least one embedded grammar that is utilized in a plurality of contexts within the search space;
within the search space, for each node representing the embedded grammar, associating a grammar identifier with the node that is uniquely associated with the embedded grammar, said node being referred to as a grammar node, wherein the same grammar identifier is used for each of the plurality of contexts, said grammar identifier referencing a recursive transition network corresponding to the embedded grammar;
when decoding speech based on the generated search space, using a unidirectional decoding algorithm to determine probabilities for nodes of the search space other than those nodes that represent the embedded grammar; and
when encouraging a grammar node associated with the identifier for the embedded grammar, determining an incoming probability for nodes preceding the grammar node, calculating an outgoing probability for the grammar node using the recursive transition network referenced by the grammar identifier, returning to a point in the search space immediately following the grammar node, and continuing to decode speech using the unidirectional decoding algorithm for nodes subsequent to the grammar node that are not other grammar modes, where a probability used by the unidirectional decoding algorithm for a next node following the grammar node is the outgoing probability.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention discloses a method for semantically processing speech for speech recognition purposes. The method can reduce an amount of memory required for a Viterbi search of an N-gram language model having a value of N greater than two and also having at least one embedded grammar that appears in a multiple contexts to a memory size of approximately a bigram model search space with respect to the embedded grammar. The method also reduces needed CPU requirements. Achieved reductions can be accomplished by representing the embedded grammar as a recursive transition network (RTN), where only one instance of the recursive transition network is used for the contexts. Other than the embedded grammars, a Hidden Markov Model (HMM) strategy can be used for the search space.
-
Citations
20 Claims
-
1. A speech processing method comprising:
-
generating a speech space for a speech recognition decoder from an N-gram language model having N greater than two and including at least one embedded grammar that is utilized in a plurality of contexts within the search space; within the search space, for each node representing the embedded grammar, associating a grammar identifier with the node that is uniquely associated with the embedded grammar, said node being referred to as a grammar node, wherein the same grammar identifier is used for each of the plurality of contexts, said grammar identifier referencing a recursive transition network corresponding to the embedded grammar; when decoding speech based on the generated search space, using a unidirectional decoding algorithm to determine probabilities for nodes of the search space other than those nodes that represent the embedded grammar; and when encouraging a grammar node associated with the identifier for the embedded grammar, determining an incoming probability for nodes preceding the grammar node, calculating an outgoing probability for the grammar node using the recursive transition network referenced by the grammar identifier, returning to a point in the search space immediately following the grammar node, and continuing to decode speech using the unidirectional decoding algorithm for nodes subsequent to the grammar node that are not other grammar modes, where a probability used by the unidirectional decoding algorithm for a next node following the grammar node is the outgoing probability. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 19)
-
-
11. A speech recognition method comprising:
-
composing a finite state machine search space for a speech recognition decoder that is based upon an N-gram language mode that includes at least one embedded grammar, wherein said N-gram model has a value of N greater than two, wherein said finite state machine search space includes statistical language model (SLM) nodes and grammar nodes, each grammar node representing a state associated with one of the at least one embedded grammar; a decoding algorithm processing the statistical language model nodes using a Hidden Markov Model (HMM) based strategy; and a decoding algorithm processing the grammar nodes using a Recursive Transition Network (RTN) based strategy, wherein only one instance of each of the embedded grammars is needed regardless of a number of contexts in which each of the embedded grammars is utilized. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
20. A method for semantically processing speech for speech recognition purposes comprising:
reducing an amount of memory required for a Viterbi search of an N-gram language model having a value of N greater than two and also having at least one embedded grammar that appears in a plurality of contexts to a memory size of approximately a bigram model search space with respect to the embedded grammar by representing the embedded grammar as a recursive transition network, where only one instance of the recursive transition network is used for the plurality of contexts.
Specification