Enhancement to Viterbi speech processing algorithm for hybrid speech models that conserves memory

US 7,805,305 B2
Filed: 10/12/2006
Issued: 09/28/2010
Est. Priority Date: 10/12/2006
Status: Expired due to Fees

First Claim

Patent Images

1. A speech processing method comprising:

generating, with at least one computer system, a search space from an N-gram language model with N greater than two, wherein the search space comprises a plurality of nodes including at least one grammar node that represents within the search space, an embedded grammar that is utilized in a plurality of contexts; and

associating a grammar identifier that is uniquely associated with the embedded grammar with the at least one grammar node, wherein the same grammar identifier is used for each of the plurality of contexts, said grammar identifier referencing a recursive transition network corresponding to the embedded grammar.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention discloses a method for semantically processing speech for speech recognition purposes. The method can reduce an amount of memory required for a Viterbi search of an N-gram language model having a value of N greater than two and also having at least one embedded grammar that appears in a multiple contexts to a memory size of approximately a bigram model search space with respect to the embedded grammar. The method also reduces needed CPU requirements. Achieved reductions can be accomplished by representing the embedded grammar as a recursive transition network (RTN), where only one instance of the recursive transition network is used for the contexts. Other than the embedded grammars, a Hidden Markov Model (HMM) strategy can be used for the search space.

18 Citations

View as Search Results

18 Claims

1. A speech processing method comprising:
- generating, with at least one computer system, a search space from an N-gram language model with N greater than two, wherein the search space comprises a plurality of nodes including at least one grammar node that represents within the search space, an embedded grammar that is utilized in a plurality of contexts; and
  
  associating a grammar identifier that is uniquely associated with the embedded grammar with the at least one grammar node, wherein the same grammar identifier is used for each of the plurality of contexts, said grammar identifier referencing a recursive transition network corresponding to the embedded grammar.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, further comprising:
    - decoding speech based on the generated search space, wherein decoding speech comprises;
      
      using a unidirectional decoding algorithm to determine probabilities for the plurality of nodes of the search space other than the at least one grammar node; and
      
      when encountering a grammar node, determining an incoming probability for at least one node preceding the grammar node, calculating an outgoing probability for the grammar node using the recursive transition network referenced by the grammar identifier, returning to a point in the search space immediately following the grammar node, and continuing to decode speech using the unidirectional decoding algorithm for nodes subsequent to the grammar node that are not grammar nodes, wherein a probability used by the unidirectional decoding algorithm for a next node following the grammar node is the outgoing probability.
  - 3. The method of claim 2, wherein the unidirectional decoding algorithm is used to find a most likely sequence of hidden states given an observed event, wherein said hidden states and said observed event are associated with nodes of the search space.
  - 4. The method of claim 2, wherein the unidirectional decoding algorithm is a Viterbi algorithm.
  - 5. The method of claim 1, wherein the embedded grammar is a context-free grammar.
  - 6. The method of claim 1, wherein the embedded grammar is written in a grammar format specification language selected from a group of languages consisting of a BNF (Backus-Naur form), a Speech Recognition Grammar Speech (SRGS) compliant language, and a JAVA Speech Grammar Format (JSGF) compliant language.
  - 7. The method of claim 1, wherein the embedded grammar is associated with a single sub network instance that is used for each of the plurality of contexts.
  - 8. The method of claim 1, wherein the at least one grammar node comprises a plurality of grammar nodes for different embedded grammars, each grammar node being associated with a grammar specific recursive transition network.
  - 9. The method of claim 1, wherein the N-gram language model is a trigram language model.
  - 10. The method of claim 1, wherein said steps of claim 1 are steps performed by at least one machine in accordance with at least one computer program having a plurality of code sections that are executable by the at least one machine.

11. A speech recognition decoder comprising:
- at least one processor programmed to;
  
  generate a finite state machine search space from an N-gram language model with N greater than two, wherein the N-gram language model includes at least one embedded grammar, wherein said finite state machine search space includes statistical language model (SLM) nodes and grammar nodes, each grammar node representing a state associated with one of the at least one embedded grammar;
  
  process the SLM nodes using a Hidden Markov Model (HMM) based strategy; and
  
  process the grammar nodes using a Recursive Transition Network (RTN) based strategy, wherein the finite state machine search space includes a single grammar node for each of the embedded grammars regardless of a number of contexts in which each of the embedded grammars is represented in the N-gram language model.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The speech recognition decoder of claim 11, wherein at least one embedded grammar is associated with a plurality of grammar nodes, wherein the at least one processor is further programmed to:
    - associate each of the plurality of grammar nodes for the embedded grammar with a grammar identifier that references a Recursive Transition Network (RTN) for the embedded grammar.
  - 13. The speech recognition decoder of claim 11, wherein the at least one embedded grammar comprises a plurality of different embedded grammars, at least one of which is used in a plurality of contexts in the finite state machine search space.
  - 14. The speech recognition decoder of claim 11, wherein the at least one processor uses a Viterbi based decoding algorithm to process the statistical language model (SLM) nodes.
  - 15. The speech recognition decoder of claim 14, wherein the at least one processor is further programmed to:
    - calculate a plurality of Viterbi paths for a received speech utterance to determine a text segment having a highest probability; and
      
      output the determined text segment as a speech recognition result for the received speech utterance.
  - 16. The speech recognition decoder of claim 15, wherein at least one of the plurality of Viterbi paths includes at least one grammar node.
  - 17. The speech recognition decoder of claim 11, wherein the embedded grammar is written in a grammar format specification language selected from a group of languages consisting of a BNF (Backus-Naur form), a Speech Recognition Grammar Specification (SRGS) compliant language, and a JAVA Speech Grammar Format (JSGF) compliant language.

18. A method for semantically processing speech for speech recognition, the method comprising:
- representing, with at least one computer system, an embedded grammar as a recursive transition network, wherein the embedded grammar is used in a plurality of contexts in an N-gram language model with N greater than two, wherein a single instance of the recursive transition network is used for the plurality of contexts.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Krbec, Pavel, Hampl, Radek, Beran, Tomas, Badt, Daniel E., Sedivy, Jan
Primary Examiner(s)
Chawan; Vijay B

Application Number

US11/548,976
Publication Number

US 20080091429A1
Time in Patent Office

1,447 Days
Field of Search

704/257, 704/270.1, 704/9, 704/275, 704/231, 704/256, 704/235, 704/242, 704/226
US Class Current

704/257
CPC Class Codes

G10L 15/12 using dynamic programming t...

G10L 15/197 Probabilistic grammars, e.g...

Enhancement to Viterbi speech processing algorithm for hybrid speech models that conserves memory

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

18 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Enhancement to Viterbi speech processing algorithm for hybrid speech models that conserves memory

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

18 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links