Pattern matching for large vocabulary speech recognition with packed distribution and localized trellis access
First Claim
1. A method for improving pattern matching in a speech recognition system having a plurality of acoustic models, comprising:
- (a) receiving continuous speech input;
(b) generating a sequence of acoustic feature vectors that represent temporal and spectral behavior of the speech input;
(c) loading a first group of acoustic feature vectors from the sequence of acoustic feature vectors into a memory workspace accessible to a processor;
(d) loading an acoustic model from the plurality of acoustic models into the memory workspace; and
(e) determining a similarity measure for each acoustic feature vector of the first group of acoustic feature vectors in relation to the acoustic model.
1 Assignment
0 Petitions
Accused Products
Abstract
A method is provided for improving pattern matching in a speech recognition system having a plurality of acoustic models (20). Similarity measures for acoustic feature vectors (54) are determined in groups that are then buffered into cache memory (59). To further reduce computational processing, the acoustic data may be partitioned amongst a plurality of processing nodes (66, 67, 68). In addition, a priori knowledge of the spoken order may be used to establish the access order (124) used to copy records from the main speech parameter table (120, 200) into a sub-table (130, 204). The sub-table is processed such that the entries are in contiguous memory locations (206) and sorted according to the processing order (208). The speech processing algorithm is then directed to operate upon the sub-table (210) which causes the processor to load the sub-table into high speed cache memory (104, 212).
23 Citations
46 Claims
-
1. A method for improving pattern matching in a speech recognition system having a plurality of acoustic models, comprising:
-
(a) receiving continuous speech input;
(b) generating a sequence of acoustic feature vectors that represent temporal and spectral behavior of the speech input;
(c) loading a first group of acoustic feature vectors from the sequence of acoustic feature vectors into a memory workspace accessible to a processor;
(d) loading an acoustic model from the plurality of acoustic models into the memory workspace; and
(e) determining a similarity measure for each acoustic feature vector of the first group of acoustic feature vectors in relation to the acoustic model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An architectural arrangement for a speech recognition system having a plurality of acoustic models residing in a data store, comprising:
-
an acoustic front-end node receptive of continuous speech input, the acoustic front-end node operable to generate a sequence of acoustic feature vectors that represent temporal and spectral behavior of the speech input;
a first pattern matching node having a first data processor and a first memory space accessible to the first data processor, the first pattern matching node adapted to receive a first group of acoustic feature vectors from the sequence of acoustic feature vectors into the first memory space, the first pattern matching node further operable to load a first acoustic model in the first memory space from the data store and to determine a similarity measure for each acoustic feature vector of the first group of acoustic feature vectors in relation to the first acoustic model using the first data processor; and
a second pattern matching node having a second data processor and a second memory space accessible to the second data processor, the second pattern matching node adapted to receive the first group of acoustic feature vectors into the second memory space, the second pattern matching node further operable to load a second acoustic model in the second memory space from the data store and to determine a similarity measure for each acoustic feature vector of the first group of acoustic feature vectors in relation to the second acoustic model using the second data processor.
-
-
10. A method for improving pattern matching in a speech recognition system having a plurality of acoustic models, comprising:
-
receiving continuous speech input;
generating a sequence of acoustic feature vectors that represent temporal and spectral behavior of the speech input;
retrieving a first group of acoustic feature vectors from the sequence of acoustic feature vectors into a first memory workspace accessible to a first processor;
retrieving a first acoustic model from the plurality of acoustic models into the first memory workspace;
retrieving a first group of acoustic feature vectors from the sequence of acoustic feature vectors into a second memory workspace accessible to a second processor;
retrieving a second acoustic model from the plurality of acoustic models into the second memory workspace; and
determining a similarity measure for each acoustic feature vector of the first group of acoustic feature vectors in relation to the first acoustic model by the first processor contemporaneously with determining a similarity measure for each acoustic feature vector of the first group of acoustic feature vectors in relation to the second acoustic model by the second processor.
-
-
11. A method for improving the decoding process in a speech recognition system, comprising:
-
generating a search space that is comprised of observed acoustic data, the search space having an active search space;
partitioning the active search space amongst a plurality of processing nodes; and
performing a searching operation on the active search space allocated to each processing node, such that searching operations occur concurrently on at least two of the plurality of processing nodes. - View Dependent Claims (12, 13, 14, 16)
-
-
15. A method for improving the decoding process in a speech recognition system, comprising:
-
generating a search space that is comprised of observed acoustic data, the search space having an active search space;
partitioning the active search space amongst a plurality of processing nodes; and
performing a searching operation on the active search space allocated to each processing node, such that searching operations occur concurrently on at least two of the plurality of processing nodes;
wherein the step of partitioning the active search space further comprises segmenting the active search space in a manner that minimizes links and allocating segmented active search space amongst the plurality of the processing nodes in proportion to processing power associated with each processing node.
-
-
17. A distributed architectural arrangement for a speech recognition system, the speech recognition system operable to generate a search space defined by a plurality of lexical trees, comprising:
-
a first searching node having a first data processor and a first memory space accessible to the first data processor, the first searching node adapted to receive similarity measures that correlate speech input to a plurality of acoustic models and operable to evaluate a first lexical tree based on the similarity measures;
a second searching node having a second data processor and a second memory space accessible to the second data processor, the second searching node adapted to receive the similarity measures and operable to evaluate a second lexical tree based on the similarity measures; and
a communication link interconnecting the first and second searching nodes. - View Dependent Claims (18, 19)
-
-
20. A distributed architectural arrangement for a speech recognition system, the speech recognition system operable to generate a search space defined by a plurality of lexical trees, comprising:
-
a first searching node having a first data processor and a first memory space accessible to the first data processor, the first searching node adapted to receive similarity measures that correlate speech input to a plurality of acoustic models and operable to evaluate a first lexical tree based on the similarity measures;
a second searching node having a second data processor and a second memory space accessible to the second data processor, the second searching node adapted to receive the similarity measures and operable to evaluate a second lexical tree based on the similarity measures; and
a communication link interconnecting the first and second searching nodes;
wherein the plurality of lexical trees are interconnected by one or more links and each of the searching nodes maintains link data indicative of the links amongst the plurality of lexical trees; and
wherein the first searching node initiates communication of the changes to the link data prior to completing the evaluation of the first lexical tree.
-
-
21. A distributed architectural arrangement for a speech recognition system, the speech recognition system operable to generate a search space defined by a plurality of lexical trees, comprising:
-
a first searching node having a first data processor and a first memory space accessible to the first data processor, the first searching node adapted to receive similarity measures that correlate speech input to a plurality of acoustic models and operable to evaluate a first lexical tree based on the similarity measures;
a second searching node having a second data processor and a second memory space accessible to the second data processor, the second searching node adapted to receive the similarity measures and operable to evaluate a second lexical tree based on the similarity measures;
a communication link interconnecting the first and second searching nodes; and
a pattern matching node adapted to receive acoustic feature vector data indicate of the speech input and operable to determine similarity measures for the acoustic feature vector data in relation to the plurality of acoustic models, the pattern matching node further operable to communicate similarity measures over an unreliable second communication link to each of the first searching node and the second searching node. - View Dependent Claims (22, 23)
-
-
24. A distributed architectural arrangement for a speech recognition system, the speech recognition system operable to generate a search space defined by a plurality of lexical trees, comprising:
-
a first searching node having a first data processor and a first memory space accessible to the first data processor, the first searching node adapted to receive similarity measures that correlate speech input to a plurality of acoustic models and operable to evaluate a first lexical tree based on the similarity measures;
a second searching node having a second data processor and a second memory space accessible to the second data processor, the second searching node adapted to receive the similarity measures and operable to evaluate a second lexical tree based on the similarity measures; and
a communication link interconnecting the first and second searching nodes;
wherein the plurality of lexical trees are interconnected by one or more links and each of the searching nodes maintains link data indicative of the links amongst the plurality of lexical trees; and
wherein at least one of the first searching node and the second searching node is operable to reduce the search space by performing histogram pruning. - View Dependent Claims (25, 26)
-
-
27. A method for processing speech data utilizing high speed cache memory having an associated cache mechanism for transfer of data from system memory into cache memory, comprising:
-
providing a main table of speech data in system memory;
providing a list that establishes a processing order of a subset of said speech data;
copying said subset of said speech data into a sub-table that is processed such that entries in said sub-table occupy contiguous memory locations;
using a speech processing algorithm to operate upon said sub-table; and
employing the cache mechanism associated with said high speed cache memory to transfer said sub-table into said high speed cache memory, thereby allowing said speech processing algorithm to access said subset of speech data at cache memory access rates. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46)
-
Specification