System and apparatus for recognizing speech
First Claim
1. A computer-readable medium having stored thereon instructions for producing a textual representation of a speech signal, the instructions, when executed by a shared memory multiprocessor computer, cause the computer to:
- create at least one processing thread for each processor;
create a plurality of active state subsets from a plurality of states, and a plurality of active arc subsets from a plurality of arcs;
assign each of the plurality of active state subsets to a different processing thread, and each of the plurality of said active arc subsets to a different processing thread;
process the plurality of active states subsets and the plurality of active arc subsets in parallel; and
produce a textual representation of the speech signal.
3 Assignments
0 Petitions
Accused Products
Abstract
A continuous, speaker independent, speech recognition method and system for recognizing a variety of vocabulary input signals. A language model which is an implicit description of a graph consisting of a plurality of states and arcs is inputted into the system. An input speech signal, corresponding to a plurality of speech frames is received and processed using a shared memory multipurpose machine having a plurality of microprocessors working in parallel to produce a textual representation of the speech signal.
45 Citations
26 Claims
-
1. A computer-readable medium having stored thereon instructions for producing a textual representation of a speech signal, the instructions, when executed by a shared memory multiprocessor computer, cause the computer to:
-
create at least one processing thread for each processor;
create a plurality of active state subsets from a plurality of states, and a plurality of active arc subsets from a plurality of arcs;
assign each of the plurality of active state subsets to a different processing thread, and each of the plurality of said active arc subsets to a different processing thread;
process the plurality of active states subsets and the plurality of active arc subsets in parallel; and
produce a textual representation of the speech signal. - View Dependent Claims (2, 3, 4, 5, 6)
update the plurality of active arc subsets based on the plurality of active state subsets;
evaluate each of the active arc subsets;
prune each of the active arc subsets;
update the plurality of active state subsets based on said pruning; and
determine transitions out of newly active states within the plurality of active state subsets.
-
-
3. The computer-readable medium of claim 2, having stored thereon instructions that, when executed by the computer, further cause the computer to:
-
calculate a likelihood cost for each active arc within each of the active arc subsets and store the likelihood cost in the shared memory;
calculate a maximum likelihood and a minimum cost for each of the active arc subsets; and
calculate a global minimum cost for the active arc subsets.
-
-
4. The computer-readably medium of claim 3, having stored thereon instructions that, when executed by the computer, further cause the computer to:
-
determine whether the likelihood cost has been previously calculated; and
retrieve the likelihood cost from the shared memory, if so determined.
-
-
5. The computer-readable medium of claim 2, having stored thereon instructions that, when executed by the computer, further cause the computer to:
-
exclude, from each of the active arc subsets, each active arc whose likelihood cost falls outside a predetermined range; and
include, within each of the active arc subsets, each active arc whose likelihood cost falls inside the predetermined range.
-
-
6. The computer-readable medium of claim 2, having stored thereon instructions that, when executed by the computer, further cause the computer to compose automata using synchronous access to a hash table, the hash table mapping tuples of states to state numbers in the composed automation.
-
7. A speech recognition system for recognizing a variety of speech inputs, comprising:
-
a language model having a plurality of states and a plurality of arcs;
an input device responsive to a speech signal to produce a plurality of speech frames;
a shared memory multiprocessor computer coupled to the input device and responsive to the plurality of speech frames to;
create at least one processing thread for each processor;
create a plurality of active state subsets from the plurality of states, and a plurality of active arc subsets from the plurality of arcs;
assign each of the plurality of active state subsets to a different processing thread, and each of the plurality of said active arc subsets to a different processing thread;
process the plurality of active states subsets and the plurality of active arc subsets in parallel; and
produce a textual representation of the speech signal; and
an output device, coupled to the computer, to receive the textural representation. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
digitally sample the analog signal; and
provide a digital representation of the analog signal to the input device.
-
-
13. The speech recognition system of claim 7, wherein the computer is further responsive to the plurality of speech frames to:
-
update the plurality of active arc subsets based on the plurality of active state subsets;
evaluate each of the active arc subsets;
prune each of the active arc subsets;
update the plurality of active state subsets based on said pruning;
determine transitions out of newly active states within the plurality of active state subsets.
-
-
14. The speech recognition system of claim 13, wherein the computer is further responsive to the plurality of speech frames to:
-
calculate a likelihood cost for each active arc within each of the active arc subsets and store the likelihood cost in the shared memory;
calculate a maximum likelihood and a minimum cost for each of the active arc subsets; and
calculate a global minimum cost for the active arc subsets.
-
-
15. The speech recognition system of claim 14, wherein the computer is further responsive to the plurality of speech frames to:
-
determine whether the likelihood cost has been previously calculated; and
retrieve the likelihood cost from the shared memory, if so determined.
-
-
16. The speech recognition system of claim 13, wherein the computer is further responsive to the plurality of speech frames to:
-
exclude, for each of the active arc subsets, each active arc whose likelihood cost falls outside a predetermined range; and
include, within each of the active arc subsets, each active arc whose likelihood cost falls inside the predetermined range.
-
-
17. The speech recognition system of claim 13, wherein the computer is further responsive to the plurality of speech frames to compose automata using synchronous access to a hash table, the hash table mapping tuples of states to state numbers in the composed automaton.
-
18. A speech recognition apparatus for recognizing a variety of speech inputs, comprising:
-
an input means responsive to a speech signal to produce a plurality of speech frames;
a processing means, coupled to the input means and having at least 2 processors and a shared memory, responsive to the plurality of speech frames to;
create at least one processing thread for each processor;
create a plurality of active state subsets from a plurality of states, and a plurality of active arc subsets from a plurality of arcs;
assign each of the plurality of active state subsets to a different processing thread, and each of the plurality of said active arc subsets to a different processing thread;
process the plurality of active states subsets and the plurality of active arc subsets in parallel; and
produce a textural representation of the speech signal; and
an output means, coupled to the processing means, to receive the textual representation of the speech signal. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26)
update the plurality of active arc subsets based on the plurality of active state subsets;
evaluate each of the active arc subsets;
prune each of the active arc subsets;
update the plurality of active state subsets based on said pruning; and
determine transitions out of newly active states within the plurality of active state subsets.
-
-
23. The speech recognition apparatus of claim 22, wherein the processing means is further responsive to the plurality of speech frames to:
-
calculate a likelihood cost for each active arc within each of the active arc subsets and store the likelihood cost in the shared memory;
calculate a maximum likelihood and a minimum cost for each of the active arc subsets; and
calculate a global minimum cost for the active arc subsets.
-
-
24. The speech recognition apparatus of claim 23, wherein the processing means is further responsive to the plurality of speech frames to:
-
determine whether the likelihood cost has been previously calculated; and
retrieve the likelihood cost from the shared memory, if so determined.
-
-
25. The speech recognition apparatus of claim 22, wherein the processing means is further responsive to the plurality of speech frames to:
-
exclude, from each of the active arc subsets, each active arc whose likelihood cost falls outside a predetermined range; and
include, within each of the active arc subsets, each active arc whose likelihood cost falls inside the predetermined range.
-
-
26. The speech recognition apparatus of claim 22, wherein the processing means is further responsive to the plurality of speech frames to compose automata using synchronous access to a hash table, the hash table mapping tuples of states to state numbers in the composed automaton.
Specification