Assigning and processing states and arcs of a speech recognition model in parallel processors
First Claim
1. In a speech recognition system, a method for recognizing a variety of speech inputs using a language model having a plurality of active states, said method comprising the steps of:
- partitioning said plurality of active states to create one or more active state subsets, each of said subsets including a number of active states;
assigning each of said active state subsets to one or a plurality of microprocessors included in a multiprocessor shared memory machine;
determining active arcs associated with said plurality of active states;
assigning said active arcs to a particular processor based on said assignment of active state subsets;
performing a likelihood calculation for each of said active arcs;
pruning said active arcs based on said likelihood calculation such that said arcs having a likelihood calculation within a computed range are included in an active arc subset;
determining whether a likelihood calculation associated with an active arc has previously been performed;
storing the result of said likelihood calculation associated with an active arc in a memory in parallel; and
producing a textual representation of said speech input based on the processing of said active states and active arcs by said plurality of microprocessors.
4 Assignments
0 Petitions
Accused Products
Abstract
A continuous, speaker independent, speech recognition method and system recognizes a variety of vocabulary input signals. A language model, which is an implicit description of a graph consisting of a plurality of states and arcs, is input into the system. An input speech signal, corresponding to a plurality of speech frames, is received and processed using a shared memory multipurpose machine having a plurality of microprocessors. Threads are created and assigned to processors, and active state subsets and active arc subsets are created and assigned to specific threads and associated microprocessors. Active state subsets and active arc subsets are processed in parallel to produce a textual representation of the speech signal. Embodiments of the invention include a two-level Viterbi search algorithm to match the input speech signals to context dependent units, an on-demand composition of finite state transducers to map context dependent units to sentences, and a determination whether a particular likelihood calculation needs to be performed or recalled from memory. The on-demand composition of finite state transducers is accomplished by multi-threading the calculation in accordance with the parallel processing feature of the system.
-
Citations
13 Claims
-
1. In a speech recognition system, a method for recognizing a variety of speech inputs using a language model having a plurality of active states, said method comprising the steps of:
-
partitioning said plurality of active states to create one or more active state subsets, each of said subsets including a number of active states;
assigning each of said active state subsets to one or a plurality of microprocessors included in a multiprocessor shared memory machine;
determining active arcs associated with said plurality of active states;
assigning said active arcs to a particular processor based on said assignment of active state subsets;
performing a likelihood calculation for each of said active arcs;
pruning said active arcs based on said likelihood calculation such that said arcs having a likelihood calculation within a computed range are included in an active arc subset;
determining whether a likelihood calculation associated with an active arc has previously been performed;
storing the result of said likelihood calculation associated with an active arc in a memory in parallel; and
producing a textual representation of said speech input based on the processing of said active states and active arcs by said plurality of microprocessors. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. In a speech recognition system having a shared memory multiprocessor machine, a method for recognizing a variety of speech inputs, comprising:
-
receiving a speech signal having a plurality of speech frames;
receiving a language model having a plurality of states and a plurality of arcs;
creating at least one processing thread for each processor in said multiprocessor machine;
creating a plurality of active state subsets from said plurality of states, and a plurality of active arc subsets from said plurality of arcs;
assigning each of said plurality of active state subsets to a different processing thread, and each of said plurality of active arc subsets to a different processing thread;
processing said plurality of active state subsets and said plurality of active arc subsets in parallel using said multiprocessor machine; and
producing a textual representation of said speech signal based on the processing of said plurality of active state subsets and said active arc subsets. - View Dependent Claims (9, 10, 11, 12, 13)
updating said plurality of active arcs subsets based on said plurality of active state subsets;
evaluating each of said active arc subsets;
pruning each of said active arc subsets;
updating said plurality of active state subsets based on said step of pruning; and
determining transitions out of newly active states within said plurality of active state subsets.
-
-
10. The method of claim 9, wherein said evaluating includes:
-
calculating a likelihood cost for each active arc within each of said active arc subsets and storing the likelihood cost in a memory in said shared memory machine;
calculating a maximum likelihood and a minimum cost for each of said active arc subsets; and
calculating a global minimum cost for said active arc subsets.
-
-
11. The method of claim 10, wherein said calculating a likelihood cost for each active arc includes:
-
determining whether said likelihood cost has been previously calculated; and
retrieving said likelihood cost from said shared memory, if so determined.
-
-
12. The method of claim 9, wherein said pruning includes:
-
excluding, from each of said active arc subsets, each active arc whose likelihood cost falls outside a predetermined range; and
including, within each of said active arc subsets, each active arc whose likelihood cost falls inside the predetermined range.
-
-
13. The method of claim 9, wherein said determining includes:
on-demand composition of automata using synchronous access to a hash table, said hash table mapping tuples of states to state numbers in the composed automation.
Specification