Pattern matching for large vocabulary speech recognition with packed distribution and localized trellis access

US 20050159952A1
Filed: 03/19/2003
Published: 07/21/2005
Est. Priority Date: 04/22/2002
Status: Abandoned Application

First Claim

Patent Images

1. A method for improving pattern matching in a speech recognition system having a plurality of acoustic models, comprising:

(a) receiving continuous speech input;

(b) generating a sequence of acoustic feature vectors that represent temporal and spectral behavior of the speech input;

(c) loading a first group of acoustic feature vectors from the sequence of acoustic feature vectors into a memory workspace accessible to a processor;

(d) loading an acoustic model from the plurality of acoustic models into the memory workspace; and

(e) determining a similarity measure for each acoustic feature vector of the first group of acoustic feature vectors in relation to the acoustic model.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method is provided for improving pattern matching in a speech recognition system having a plurality of acoustic models (20). Similarity measures for acoustic feature vectors (54) are determined in groups that are then buffered into cache memory (59). To further reduce computational processing, the acoustic data may be partitioned amongst a plurality of processing nodes (66, 67, 68). In addition, a priori knowledge of the spoken order may be used to establish the access order (124) used to copy records from the main speech parameter table (120, 200) into a sub-table (130, 204). The sub-table is processed such that the entries are in contiguous memory locations (206) and sorted according to the processing order (208). The speech processing algorithm is then directed to operate upon the sub-table (210) which causes the processor to load the sub-table into high speed cache memory (104, 212).

23 Citations

View as Search Results

46 Claims

1. A method for improving pattern matching in a speech recognition system having a plurality of acoustic models, comprising:
- (a) receiving continuous speech input;
  
  (b) generating a sequence of acoustic feature vectors that represent temporal and spectral behavior of the speech input;
  
  (c) loading a first group of acoustic feature vectors from the sequence of acoustic feature vectors into a memory workspace accessible to a processor;
  
  (d) loading an acoustic model from the plurality of acoustic models into the memory workspace; and
  
  (e) determining a similarity measure for each acoustic feature vector of the first group of acoustic feature vectors in relation to the acoustic model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1 further comprises loading a next acoustic model from the plurality of acoustic models into the memory workspace, and determining a similarity measure for each acoustic feature vector of the first group of acoustic feature vectors in the relation to said next acoustic model until similarity measures for the first group of acoustic feature vectors are determined in relation to each of the plurality of acoustic models.
  - 3. The method of claim 2 further comprises removing the acoustic model from the memory workspace prior to retrieving the next acoustic model from the plurality of acoustic models.
  - 4. The method of claim 2 further comprises storing the similarity measures for the first group of acoustic feature vectors in an output memory space.
  - 5. The method of claim 2 further comprises updating a search space based on the similarity measures for the first group of acoustic feature vectors;
    - and subsequently performing a searching operation on the search space.
  - 6. The method of claim 2 further comprises loading a second group of acoustic feature vectors from the sequence of acoustic feature vectors into the memory workspace;
    - and determining similarity measures for the second group of acoustic feature vectors in relation to each of the plurality of acoustic models.
  - 7. The method of claim 1 wherein the acoustic model is further defined as a Hidden Markov Model having a plurality of states, such that probability values for transitioning amongst the plurality of states is expressed in terms of Gaussian data.
  - 8. The method of claim 7 wherein the step of determining a similarity measure further comprises performing a Gaussian computation.

9. An architectural arrangement for a speech recognition system having a plurality of acoustic models residing in a data store, comprising:
- an acoustic front-end node receptive of continuous speech input, the acoustic front-end node operable to generate a sequence of acoustic feature vectors that represent temporal and spectral behavior of the speech input;
  
  a first pattern matching node having a first data processor and a first memory space accessible to the first data processor, the first pattern matching node adapted to receive a first group of acoustic feature vectors from the sequence of acoustic feature vectors into the first memory space, the first pattern matching node further operable to load a first acoustic model in the first memory space from the data store and to determine a similarity measure for each acoustic feature vector of the first group of acoustic feature vectors in relation to the first acoustic model using the first data processor; and
  
  a second pattern matching node having a second data processor and a second memory space accessible to the second data processor, the second pattern matching node adapted to receive the first group of acoustic feature vectors into the second memory space, the second pattern matching node further operable to load a second acoustic model in the second memory space from the data store and to determine a similarity measure for each acoustic feature vector of the first group of acoustic feature vectors in relation to the second acoustic model using the second data processor.

10. A method for improving pattern matching in a speech recognition system having a plurality of acoustic models, comprising:
- receiving continuous speech input;
  
  generating a sequence of acoustic feature vectors that represent temporal and spectral behavior of the speech input;
  
  retrieving a first group of acoustic feature vectors from the sequence of acoustic feature vectors into a first memory workspace accessible to a first processor;
  
  retrieving a first acoustic model from the plurality of acoustic models into the first memory workspace;
  
  retrieving a first group of acoustic feature vectors from the sequence of acoustic feature vectors into a second memory workspace accessible to a second processor;
  
  retrieving a second acoustic model from the plurality of acoustic models into the second memory workspace; and
  
  determining a similarity measure for each acoustic feature vector of the first group of acoustic feature vectors in relation to the first acoustic model by the first processor contemporaneously with determining a similarity measure for each acoustic feature vector of the first group of acoustic feature vectors in relation to the second acoustic model by the second processor.

11. A method for improving the decoding process in a speech recognition system, comprising:
- generating a search space that is comprised of observed acoustic data, the search space having an active search space;
  
  partitioning the active search space amongst a plurality of processing nodes; and
  
  performing a searching operation on the active search space allocated to each processing node, such that searching operations occur concurrently on at least two of the plurality of processing nodes.
- View Dependent Claims (12, 13, 14, 16)
- - 12. The method of claim 11 further comprises defining the active search space as a plurality of lexical trees and distributing the plurality of lexical trees amongst the plurality of processing nodes.
  - 13. The method of claim 12 further comprises maintaining link data indicative of links between the lexical trees at each of the plurality of processing nodes and communicating changes in the link data amongst the plurality of processing nodes.
  - 14. The method of claim 11 wherein the step of partitioning the active search space further comprises allocating the active search space amongst the plurality of the processing nodes based on available processing power associated with each processing node.
  - 16. The method of claim 11 wherein the step of performing a searching operation on the observed acoustic data further comprises defining the search operation as at least one of a Viterbi search algorithm, a stack decoding algorithm, a multi-pass search algorithm and a forward-backward search algorithm.

15. A method for improving the decoding process in a speech recognition system, comprising:
- generating a search space that is comprised of observed acoustic data, the search space having an active search space;
  
  partitioning the active search space amongst a plurality of processing nodes; and
  
  performing a searching operation on the active search space allocated to each processing node, such that searching operations occur concurrently on at least two of the plurality of processing nodes;
  
  wherein the step of partitioning the active search space further comprises segmenting the active search space in a manner that minimizes links and allocating segmented active search space amongst the plurality of the processing nodes in proportion to processing power associated with each processing node.

17. A distributed architectural arrangement for a speech recognition system, the speech recognition system operable to generate a search space defined by a plurality of lexical trees, comprising:
- a first searching node having a first data processor and a first memory space accessible to the first data processor, the first searching node adapted to receive similarity measures that correlate speech input to a plurality of acoustic models and operable to evaluate a first lexical tree based on the similarity measures;
  
  a second searching node having a second data processor and a second memory space accessible to the second data processor, the second searching node adapted to receive the similarity measures and operable to evaluate a second lexical tree based on the similarity measures; and
  
  a communication link interconnecting the first and second searching nodes.
- View Dependent Claims (18, 19)
- - 18. The distributed architectural arrangement of claim 17 wherein the plurality of lexical trees are interconnected by one or more links and each of the searching nodes maintains link data indicative of the links amongst the plurality of lexical trees.
  - 19. The distributed architectural arrangement of claim 18 wherein the evaluation of the first lexical tree by the first searching node results in changes to the link data, such that the first searching node is further operable to communicate the changes to the link data across the communication link to the second searching node.

20. A distributed architectural arrangement for a speech recognition system, the speech recognition system operable to generate a search space defined by a plurality of lexical trees, comprising:
- a first searching node having a first data processor and a first memory space accessible to the first data processor, the first searching node adapted to receive similarity measures that correlate speech input to a plurality of acoustic models and operable to evaluate a first lexical tree based on the similarity measures;
  
  a second searching node having a second data processor and a second memory space accessible to the second data processor, the second searching node adapted to receive the similarity measures and operable to evaluate a second lexical tree based on the similarity measures; and
  
  a communication link interconnecting the first and second searching nodes;
  
  wherein the plurality of lexical trees are interconnected by one or more links and each of the searching nodes maintains link data indicative of the links amongst the plurality of lexical trees; and
  
  wherein the first searching node initiates communication of the changes to the link data prior to completing the evaluation of the first lexical tree.

21. A distributed architectural arrangement for a speech recognition system, the speech recognition system operable to generate a search space defined by a plurality of lexical trees, comprising:
- a first searching node having a first data processor and a first memory space accessible to the first data processor, the first searching node adapted to receive similarity measures that correlate speech input to a plurality of acoustic models and operable to evaluate a first lexical tree based on the similarity measures;
  
  a second searching node having a second data processor and a second memory space accessible to the second data processor, the second searching node adapted to receive the similarity measures and operable to evaluate a second lexical tree based on the similarity measures;
  
  a communication link interconnecting the first and second searching nodes; and
  
  a pattern matching node adapted to receive acoustic feature vector data indicate of the speech input and operable to determine similarity measures for the acoustic feature vector data in relation to the plurality of acoustic models, the pattern matching node further operable to communicate similarity measures over an unreliable second communication link to each of the first searching node and the second searching node.
- View Dependent Claims (22, 23)
- - 22. The distributed architectural arrangement of claim 21 wherein at least one of the first searching node and the second searching node is operable to request retransmission of similarity measures from the pattern matching node upon detecting an error in the transmission of the similarity measures from the pattern matching node.
  - 23. The distributed architectural arrangement of claim 22 wherein at least one of the first searching node and the second searching node is operable to recompute similarity measures upon detecting an error in the transmission of the similarity measures from the pattern matching node.

24. A distributed architectural arrangement for a speech recognition system, the speech recognition system operable to generate a search space defined by a plurality of lexical trees, comprising:
- a first searching node having a first data processor and a first memory space accessible to the first data processor, the first searching node adapted to receive similarity measures that correlate speech input to a plurality of acoustic models and operable to evaluate a first lexical tree based on the similarity measures;
  
  a second searching node having a second data processor and a second memory space accessible to the second data processor, the second searching node adapted to receive the similarity measures and operable to evaluate a second lexical tree based on the similarity measures; and
  
  a communication link interconnecting the first and second searching nodes;
  
  wherein the plurality of lexical trees are interconnected by one or more links and each of the searching nodes maintains link data indicative of the links amongst the plurality of lexical trees; and
  
  wherein at least one of the first searching node and the second searching node is operable to reduce the search space by performing histogram pruning.
- View Dependent Claims (25, 26)
- - 25. The distributed architectural arrangement of claim 24 wherein each searching node is operable to compute a histogram associated with its processing and communicate statistics indicative of the histogram to the other searching node.
  - 26. The distributed architectural arrangement of claim 24 wherein the histogram statistics is further defined as a maximum score value, a mean score value and a number of active nodes associated with the searching node.

27. A method for processing speech data utilizing high speed cache memory having an associated cache mechanism for transfer of data from system memory into cache memory, comprising:
- providing a main table of speech data in system memory;
  
  providing a list that establishes a processing order of a subset of said speech data;
  
  copying said subset of said speech data into a sub-table that is processed such that entries in said sub-table occupy contiguous memory locations;
  
  using a speech processing algorithm to operate upon said sub-table; and
  
  employing the cache mechanism associated with said high speed cache memory to transfer said sub-table into said high speed cache memory, thereby allowing said speech processing algorithm to access said subset of speech data at cache memory access rates.
- View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46)
- - 28. The method of claim 27 wherein said main table stores speech parameters.
  - 29. The method of claim 27 wherein said main table stores Gaussian parameters.
  - 30. The method of claim 27 wherein said list that establishes a processing order is developed from a speech utterance having a temporal sequence.
  - 31. The method of claim 27 wherein said sub-table resides in system memory.
  - 32. The method of claim 27 wherein said copying step is performed such that said entries in said sub-table are sorted in an order defined by said processing order established by said list.
  - 33. The method of claim 27 wherein said speech processing algorithm is a multi-pass process that includes one pass that establishes said list.
  - 34. The method of claim 27 wherein said speech processing algorithm is a multi-pass recognition process.
  - 35. The method of claim 27 wherein said speech processing algorithm is an acoustic model adaptation process.
  - 36. The method of claim 27 wherein said speech processing algorithm is a lattice rescoring process.
  - 37. The method of claim 27 wherein said speech processing algorithm is a speech model training process.
  - 38. The method of claim 27 wherein said speech processing algorithm is a constrained search on a word/phone graph.
  - 39. The method of claim 27 wherein said speech processing algorithm is a constrained search on a focused language model.
  - 40. The method of claim 27 wherein said speech processing algorithm is a Viterbi or Baum-Welch local distance computation.
  - 41. The method of claim 27 wherein said speech processing algorithm is a trellis expansion algorithm.
  - 42. The method of claim 27 wherein said speech processing algorithm is a beam search algorithm.
  - 43. The method of claim 27 wherein said list that establishes a processing order is developed from language constraints from which a temporal order of access can be devised.
  - 44. The method of claim 27 wherein said list that establishes a processing order is developed from search space constraints from which a temporal order of access can be devised.
  - 45. The method of claim 27 wherein said speech processing algorithm is a multi-pass process that includes one pass that outputs language constraints that are used to establish said list.
  - 46. The method of claim 27 wherein said speech processing algorithm is a multi-pass process that includes one pass that outputs search space constraints that are used to establish said list.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sovereign Peak Ventures, LLC (Dominion Harbor Enterprises, LLC)
Original Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Inventors
Rigazio, Luca, Nguyen, Patrick

Application Number

US10/512,354
Publication Number

US 20050159952A1
Time in Patent Office

Days
Field of Search
US Class Current

704/243
CPC Class Codes

G10L 15/08   Speech classification or se...

G10L 15/10   using distance or distortio...

G10L 15/285   Memory allocation or algori...

G10L 15/30   Distributed recognition, e....

G10L 15/34   Adaptation of a single reco...

Pattern matching for large vocabulary speech recognition with packed distribution and localized trellis access

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

23 Citations

46 Claims

Specification

Solutions

Use Cases

Quick Links

Pattern matching for large vocabulary speech recognition with packed distribution and localized trellis access

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

23 Citations

46 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links