System and apparatus for recognizing speech

US 6,374,212 B2
Filed: 03/13/2001
Issued: 04/16/2002
Est. Priority Date: 09/30/1997
Status: Expired due to Term

First Claim

Patent Images

1. A computer-readable medium having stored thereon instructions for producing a textual representation of a speech signal, the instructions, when executed by a shared memory multiprocessor computer, cause the computer to:

create at least one processing thread for each processor;

create a plurality of active state subsets from a plurality of states, and a plurality of active arc subsets from a plurality of arcs;

assign each of the plurality of active state subsets to a different processing thread, and each of the plurality of said active arc subsets to a different processing thread;

process the plurality of active states subsets and the plurality of active arc subsets in parallel; and

produce a textual representation of the speech signal.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A continuous, speaker independent, speech recognition method and system for recognizing a variety of vocabulary input signals. A language model which is an implicit description of a graph consisting of a plurality of states and arcs is inputted into the system. An input speech signal, corresponding to a plurality of speech frames is received and processed using a shared memory multipurpose machine having a plurality of microprocessors working in parallel to produce a textual representation of the speech signal.

45 Citations

View as Search Results

26 Claims

1. A computer-readable medium having stored thereon instructions for producing a textual representation of a speech signal, the instructions, when executed by a shared memory multiprocessor computer, cause the computer to:
- create at least one processing thread for each processor;
  
  create a plurality of active state subsets from a plurality of states, and a plurality of active arc subsets from a plurality of arcs;
  
  assign each of the plurality of active state subsets to a different processing thread, and each of the plurality of said active arc subsets to a different processing thread;
  
  process the plurality of active states subsets and the plurality of active arc subsets in parallel; and
  
  produce a textual representation of the speech signal.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The computer-readable medium of claim 1, having stored thereon instructions that, when executed by the computer, further cause the computer to:
3. The computer-readable medium of claim 2, having stored thereon instructions that, when executed by the computer, further cause the computer to:
- calculate a likelihood cost for each active arc within each of the active arc subsets and store the likelihood cost in the shared memory;
  
  calculate a maximum likelihood and a minimum cost for each of the active arc subsets; and
  
  calculate a global minimum cost for the active arc subsets.
4. The computer-readably medium of claim 3, having stored thereon instructions that, when executed by the computer, further cause the computer to:
- determine whether the likelihood cost has been previously calculated; and
  
  retrieve the likelihood cost from the shared memory, if so determined.
5. The computer-readable medium of claim 2, having stored thereon instructions that, when executed by the computer, further cause the computer to:
- exclude, from each of the active arc subsets, each active arc whose likelihood cost falls outside a predetermined range; and
  
  include, within each of the active arc subsets, each active arc whose likelihood cost falls inside the predetermined range.
6. The computer-readable medium of claim 2, having stored thereon instructions that, when executed by the computer, further cause the computer to compose automata using synchronous access to a hash table, the hash table mapping tuples of states to state numbers in the composed automation.

7. A speech recognition system for recognizing a variety of speech inputs, comprising:
- a language model having a plurality of states and a plurality of arcs;
  
  an input device responsive to a speech signal to produce a plurality of speech frames;
  
  a shared memory multiprocessor computer coupled to the input device and responsive to the plurality of speech frames to;
  
  create at least one processing thread for each processor;
  
  create a plurality of active state subsets from the plurality of states, and a plurality of active arc subsets from the plurality of arcs;
  
  assign each of the plurality of active state subsets to a different processing thread, and each of the plurality of said active arc subsets to a different processing thread;
  
  process the plurality of active states subsets and the plurality of active arc subsets in parallel; and
  
  produce a textual representation of the speech signal; and
  
  an output device, coupled to the computer, to receive the textural representation.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 8. The speech recognition system of claim 7, wherein the language model is input to the input device.
  - 9. The speech recognition system of claim 7, wherein the speech signal is a digital signal.
  - 10. The speech recognition system of claim 7, wherein the speech signal is an analog signal.
  - 11. The speech recognition system of claim 10, wherein the input device digitally samples the analog signal.
  - 12. The speech recognition system of claim 11, further comprising an alternative receiving device, coupled to the input device, for receiving the speech signal, the alternative receiving device being responsive to the analog signal to:
13. The speech recognition system of claim 7, wherein the computer is further responsive to the plurality of speech frames to:
- update the plurality of active arc subsets based on the plurality of active state subsets;
  
  evaluate each of the active arc subsets;
  
  prune each of the active arc subsets;
  
  update the plurality of active state subsets based on said pruning;
  
  determine transitions out of newly active states within the plurality of active state subsets.
14. The speech recognition system of claim 13, wherein the computer is further responsive to the plurality of speech frames to:
- calculate a likelihood cost for each active arc within each of the active arc subsets and store the likelihood cost in the shared memory;
  
  calculate a maximum likelihood and a minimum cost for each of the active arc subsets; and
  
  calculate a global minimum cost for the active arc subsets.
15. The speech recognition system of claim 14, wherein the computer is further responsive to the plurality of speech frames to:
- determine whether the likelihood cost has been previously calculated; and
  
  retrieve the likelihood cost from the shared memory, if so determined.
16. The speech recognition system of claim 13, wherein the computer is further responsive to the plurality of speech frames to:
- exclude, for each of the active arc subsets, each active arc whose likelihood cost falls outside a predetermined range; and
  
  include, within each of the active arc subsets, each active arc whose likelihood cost falls inside the predetermined range.
17. The speech recognition system of claim 13, wherein the computer is further responsive to the plurality of speech frames to compose automata using synchronous access to a hash table, the hash table mapping tuples of states to state numbers in the composed automaton.

18. A speech recognition apparatus for recognizing a variety of speech inputs, comprising:
- an input means responsive to a speech signal to produce a plurality of speech frames;
  
  a processing means, coupled to the input means and having at least 2 processors and a shared memory, responsive to the plurality of speech frames to;
  
  create at least one processing thread for each processor;
  
  create a plurality of active state subsets from a plurality of states, and a plurality of active arc subsets from a plurality of arcs;
  
  assign each of the plurality of active state subsets to a different processing thread, and each of the plurality of said active arc subsets to a different processing thread;
  
  process the plurality of active states subsets and the plurality of active arc subsets in parallel; and
  
  produce a textural representation of the speech signal; and
  
  an output means, coupled to the processing means, to receive the textual representation of the speech signal.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26)
- - 19. The speech recognition apparatus of claim 18, wherein a language model having a plurality of states and a plurality of arcs is input to the input means.
  - 20. The speech recognition apparatus of claim 18, wherein the speech signal is a digital signal.
  - 21. The speech recognition apparatus for claim 18, wherein the speech signal is an analog signal, the analog signal being digitally sampled by the input means.
  - 22. The speech recognition apparatus for claim 18, wherein the processing means is further responsive to the plurality of speech frames to:
23. The speech recognition apparatus of claim 22, wherein the processing means is further responsive to the plurality of speech frames to:
- calculate a likelihood cost for each active arc within each of the active arc subsets and store the likelihood cost in the shared memory;
  
  calculate a maximum likelihood and a minimum cost for each of the active arc subsets; and
  
  calculate a global minimum cost for the active arc subsets.
24. The speech recognition apparatus of claim 23, wherein the processing means is further responsive to the plurality of speech frames to:
- determine whether the likelihood cost has been previously calculated; and
  
  retrieve the likelihood cost from the shared memory, if so determined.
25. The speech recognition apparatus of claim 22, wherein the processing means is further responsive to the plurality of speech frames to:
- exclude, from each of the active arc subsets, each active arc whose likelihood cost falls outside a predetermined range; and
  
  include, within each of the active arc subsets, each active arc whose likelihood cost falls inside the predetermined range.
26. The speech recognition apparatus of claim 22, wherein the processing means is further responsive to the plurality of speech frames to compose automata using synchronous access to a hash table, the hash table mapping tuples of states to state numbers in the composed automaton.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
AT&T Corporation (AT&T, Inc.)
Inventors
Phillips, Steven, Rogers, Anne
Primary Examiner(s)
Korzuch, William
Assistant Examiner(s)
Storm, Donald L.

Application Number

US09/804,041
Publication Number

US 20010011218A1
Time in Patent Office

399 Days
Field of Search

704/231, 704/242, 704/251, 704/256
US Class Current

704/231
CPC Class Codes

G10L 15/08 Speech classification or se...

G10L 15/34 Adaptation of a single reco...

System and apparatus for recognizing speech

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

45 Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

System and apparatus for recognizing speech

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

45 Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links