ENHANCEMENT TO VITERBI SPEECH PROCESSING ALGORITHM FOR HYBRID SPEECH MODELS THAT CONSERVES MEMORY

US 20080091429A1
Filed: 10/12/2006
Published: 04/17/2008
Est. Priority Date: 10/12/2006
Status: Active Grant

First Claim

Patent Images

1. A speech processing method comprising:

generating a speech space for a speech recognition decoder from an N-gram language model having N greater than two and including at least one embedded grammar that is utilized in a plurality of contexts within the search space;

within the search space, for each node representing the embedded grammar, associating a grammar identifier with the node that is uniquely associated with the embedded grammar, said node being referred to as a grammar node, wherein the same grammar identifier is used for each of the plurality of contexts, said grammar identifier referencing a recursive transition network corresponding to the embedded grammar;

when decoding speech based on the generated search space, using a unidirectional decoding algorithm to determine probabilities for nodes of the search space other than those nodes that represent the embedded grammar; and

when encouraging a grammar node associated with the identifier for the embedded grammar, determining an incoming probability for nodes preceding the grammar node, calculating an outgoing probability for the grammar node using the recursive transition network referenced by the grammar identifier, returning to a point in the search space immediately following the grammar node, and continuing to decode speech using the unidirectional decoding algorithm for nodes subsequent to the grammar node that are not other grammar modes, where a probability used by the unidirectional decoding algorithm for a next node following the grammar node is the outgoing probability.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention discloses a method for semantically processing speech for speech recognition purposes. The method can reduce an amount of memory required for a Viterbi search of an N-gram language model having a value of N greater than two and also having at least one embedded grammar that appears in a multiple contexts to a memory size of approximately a bigram model search space with respect to the embedded grammar. The method also reduces needed CPU requirements. Achieved reductions can be accomplished by representing the embedded grammar as a recursive transition network (RTN), where only one instance of the recursive transition network is used for the contexts. Other than the embedded grammars, a Hidden Markov Model (HMM) strategy can be used for the search space.

Citations

20 Claims

1. A speech processing method comprising:
- generating a speech space for a speech recognition decoder from an N-gram language model having N greater than two and including at least one embedded grammar that is utilized in a plurality of contexts within the search space;
  
  within the search space, for each node representing the embedded grammar, associating a grammar identifier with the node that is uniquely associated with the embedded grammar, said node being referred to as a grammar node, wherein the same grammar identifier is used for each of the plurality of contexts, said grammar identifier referencing a recursive transition network corresponding to the embedded grammar;
  
  when decoding speech based on the generated search space, using a unidirectional decoding algorithm to determine probabilities for nodes of the search space other than those nodes that represent the embedded grammar; and
  
  when encouraging a grammar node associated with the identifier for the embedded grammar, determining an incoming probability for nodes preceding the grammar node, calculating an outgoing probability for the grammar node using the recursive transition network referenced by the grammar identifier, returning to a point in the search space immediately following the grammar node, and continuing to decode speech using the unidirectional decoding algorithm for nodes subsequent to the grammar node that are not other grammar modes, where a probability used by the unidirectional decoding algorithm for a next node following the grammar node is the outgoing probability.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 19)
- - 2. The method of claim 1, wherein the method reduces an amount of memory needed for the search space to handle the embedded grammar to approximately a size of a bigram model based search space.
  - 3. The method of claim 1, wherein the unidirectional decoding algorithm is used to find a most likely sequence of hidden states given an observed event, wherein said hidden states and said observed event are associated with nodes of the search space.
  - 4. The method of claim 1, wherein the unidirectional decoding algorithm is a Viterbi algorithm.
  - 5. The method of claim 1, wherein the grammar is a context-free grammar.
  - 6. The method of claim 1, wherein the embedded grammar is written in a grammar format specification language selected from a group of languages consisting of a BNF (Backus-Naur form), a Speech Recognition Grammar Speech (SRGS) compliant language, and a JAVA Speech Grammar Format (JSGF) compliant language.
  - 7. The method of claim 1, wherein the embedded grammar is associated with a single sub network instance that is used for each of the plurality of contexts.
  - 8. The method of claim 7, wherein the at least one embedded grammar comprises a plurality of different embedded grammars, each associated with a grammar specific recursive transition network.
  - 9. The method of claim 1, wherein the N-gram language model is a trigram language model.
  - 10. The method of claim 1, wherein said steps of claim 1 are steps performed by at least one machine in accordance with at least one computer program having a plurality of code sections that are executable by the at least one machine.
  - 19. The method of claim 11, wherein said steps of claim 1 are steps performed by at least one machine in accordance with at least one computer program having a plurality of code sections that are executable by the at least one machine.

11. A speech recognition method comprising:
- composing a finite state machine search space for a speech recognition decoder that is based upon an N-gram language mode that includes at least one embedded grammar, wherein said N-gram model has a value of N greater than two, wherein said finite state machine search space includes statistical language model (SLM) nodes and grammar nodes, each grammar node representing a state associated with one of the at least one embedded grammar;
  
  a decoding algorithm processing the statistical language model nodes using a Hidden Markov Model (HMM) based strategy; and
  
  a decoding algorithm processing the grammar nodes using a Recursive Transition Network (RTN) based strategy, wherein only one instance of each of the embedded grammars is needed regardless of a number of contexts in which each of the embedded grammars is utilized.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
- - 12. The method of claim 11, wherein a plurality of grammar nodes exist for the embedded grammar, each associated with a grammar identifier that references a Recursive Transition Network (RTN) for the embedded grammar.
  - 13. The method of claim 11, wherein the at least one embedded grammar comprises a plurality of different embedded grammars, at least one of which is used in a plurality of contexts in the finite state machine search space.
  - 14. The method of claim 11, wherein a Viterbi based decoding algorithm is used to process the statistical language model (SLM) nodes.
  - 15. The method of claim 14, further comprising:
    - receiving a speech utterance;
      
      calculating a plurality of Viterbi paths for the speech utterance to determine a text segment having a highest probability; and
      
      returning the determined text segment as a speech recognition result for the received speech utterance.
  - 16. The method of claim 15, wherein at least one path includes a grammar node.
  - 17. The method of claim 11, wherein the embedded grammar is written in a grammar format specification language selected from a group of languages consisting of a BNF (Backus-Naur form), a Speech Recognition Grammar Specification (SRGS) compliant language, and a JAVA Speech Grammar Format (JSGF) compliant language.
  - 18. The method of claim 11, wherein the method reduces an amount of memory needed for the search space to handle the embedded grammar to approximately a size of a bigram model based search space.

20. A method for semantically processing speech for speech recognition purposes comprising:
- reducing an amount of memory required for a Viterbi search of an N-gram language model having a value of N greater than two and also having at least one embedded grammar that appears in a plurality of contexts to a memory size of approximately a bigram model search space with respect to the embedded grammar by representing the embedded grammar as a recursive transition network, where only one instance of the recursive transition network is used for the plurality of contexts.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Badt, Daniel E., Sedivy, Jan, Krbec, Pavel, Hampl, Radek, Beran, Tomas

Granted Patent

US 7,805,305 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/255
CPC Class Codes

G10L 15/12 using dynamic programming t...

G10L 15/197 Probabilistic grammars, e.g...

ENHANCEMENT TO VITERBI SPEECH PROCESSING ALGORITHM FOR HYBRID SPEECH MODELS THAT CONSERVES MEMORY

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

ENHANCEMENT TO VITERBI SPEECH PROCESSING ALGORITHM FOR HYBRID SPEECH MODELS THAT CONSERVES MEMORY

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links