Automatic computation streaming partition for voice recognition on multiple processors with limited memory

US 8,442,829 B2
Filed: 02/02/2010
Issued: 05/14/2013
Est. Priority Date: 02/17/2009
Status: Active Grant

First Claim

Patent Images

1. A computer speech processing system, comprising:

a memory unit;

a main processing unit coupled to the memory unit;

one or more co-processor elements coupled to the memory unit and the main processing unit, wherein each of the one or more co-processor elements include a co-processor unit and a local memory associated with the co-processor unit;

a first main processor thread stored in the memory unit and configured for execution by the main processor, wherein the first main processor thread is configured to cause the main processor to perform memory maintenance and voice recognition result retrievals upon execution;

a first co-processor thread stored in the main memory or in one or more co-processor local memories and configured for execution by one or more of the co-processor units, wherein the first co-processor thread is configured to cause one or more of the co-processors to receive raw data representing sound detected by a microphone and perform voice detection and initial feature extraction on the raw data;

a second co-processor thread stored in the main memory or in one or more co-processor local memories and configured for execution by one or more of the co-processor units, wherein the second co-processor thread is configured to cause one or more of the co-processor elements to receive feature data derived for one or more features extracted by the first co-processor thread and information for locating probability density functions needed for probability computation by a speech recognition model and compute a probability that one or more features correspond to a known sub-unit of speech using the probability density functions and the feature data; and

a third co-processor thread stored in the main memory or in one or more co-processor local memories and configured for execution by one or more of the co-processor units, wherein the third co-processor thread is configured to cause one or more of the co-processor units to compute at least a portion of a path probability that a sequence of sub-units of speech correspond to a known speech unit.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Speech processing is disclosed for an apparatus having a main processing unit, a memory unit, and one or more co-processors. Memory maintenance and voice recognition result retrievals upon execution are performed with a first main processor thread. Voice detection and initial feature extraction on the raw data are performed with a first co-processor. A second co-processor thread receives feature data derived for one or more features extracted by the first co-processor thread and information for locating probability density functions needed for probability computation by a speech recognition model and computes a probability that the one or more features correspond to a known sub-unit of speech using the probability density functions and the feature data. At least a portion of a path probability that a sequence of sub-units of speech correspond to a known speech unit is computed with a third co-processor thread.

138 Citations

24 Claims

1. A computer speech processing system, comprising:
- a memory unit;
  
  a main processing unit coupled to the memory unit;
  
  one or more co-processor elements coupled to the memory unit and the main processing unit, wherein each of the one or more co-processor elements include a co-processor unit and a local memory associated with the co-processor unit;
  
  a first main processor thread stored in the memory unit and configured for execution by the main processor, wherein the first main processor thread is configured to cause the main processor to perform memory maintenance and voice recognition result retrievals upon execution;
  
  a first co-processor thread stored in the main memory or in one or more co-processor local memories and configured for execution by one or more of the co-processor units, wherein the first co-processor thread is configured to cause one or more of the co-processors to receive raw data representing sound detected by a microphone and perform voice detection and initial feature extraction on the raw data;
  
  a second co-processor thread stored in the main memory or in one or more co-processor local memories and configured for execution by one or more of the co-processor units, wherein the second co-processor thread is configured to cause one or more of the co-processor elements to receive feature data derived for one or more features extracted by the first co-processor thread and information for locating probability density functions needed for probability computation by a speech recognition model and compute a probability that one or more features correspond to a known sub-unit of speech using the probability density functions and the feature data; and
  
  a third co-processor thread stored in the main memory or in one or more co-processor local memories and configured for execution by one or more of the co-processor units, wherein the third co-processor thread is configured to cause one or more of the co-processor units to compute at least a portion of a path probability that a sequence of sub-units of speech correspond to a known speech unit.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The system of claim 1, wherein the second co-processor thread is further configured to receive code and data for retrieving a state and probability density in order to determine a most probable sub-unit of speech.
  - 3. The system of claim 2 wherein the second co-processor thread is executed with two co-processor elements, wherein a first co-processor computes the probability and a second co-processor determines the most probable state.
  - 4. The system of claim 1, wherein each local memory for each co-processor has sufficient memory available for the third co-processor thread to load all network probabilities for given network of speech sub-units for a given sub-unit probability.
  - 5. The system of claim 1, wherein the third co-processor thread is configured to load data for a speech sub-unit associated with a node in a network plus links to one or more neighboring nodes into a local store of a third co-processor element along with a complete model structure for the node and all state probabilities for a corresponding model.
  - 6. The system of claim 1, wherein the third co-processor thread is configured to compute one or more per node per frame portions of the path probability.
  - 7. The system of claim 6, further comprising a second main processor thread configured to collect the per node per frame portions probabilities from the third co-processor thread and keep track of a history of the per-node-per-frame portions and determine the path probability from the history of the per-node-per-frame portions computed by the third co-processor thread.
  - 8. The system of claim 1, wherein the third co-processor thread is configured to compute a complete per-frame path probability.
  - 9. The system of claim 8, further comprising a second main processor thread configured to keep track of a history of the per-frame path probability computed by the third co-processor thread.
  - 10. The system of claim 1 wherein the first main processor thread is further configured to receive the raw data and buffer the raw data for transfer to the first co-processor thread.
  - 11. The system of claim 1, wherein the first main processor thread is further configured to finish feature extraction from initial feature extraction performed by the first co-processor thread.

12. A computer implemented method for speech processing in a computer speech apparatus having a main processing unit, a memory unit coupled to the main processing unit and one or more co-processors coupled to the memory unit and the main processing unit, wherein each co-processor element includes a co-processor unit and a local memory associated with the co-processor unit, the method comprising:
- a) performing memory maintenance and voice recognition result retrievals upon execution with a first main processor thread executed by the main processor;
  
  b) performing voice detection and initial feature extraction on the raw data with a first co-processor thread running on one or more of the co-processor elements;
  
  c) receiving feature data derived for one or more features extracted by the first co-processor thread and information for locating probability density functions needed for probability computation by a speech recognition model and computing a probability that the one or more features correspond to a known sub-unit of speech using the probability density functions and the feature data with a second co-processor thread configured to run on one or more of the co-processor elements; and
  
  d) computing at least a portion of a path probability that a sequence of sub-units of speech correspond to a known speech unit with a third co-processor thread configured to run on one or more of the co-processor elements.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 13. The method of claim 12, wherein b) further includes retrieving a state and probability density in order to determine a most probable sub-unit of speech with the second co-processor thread.
  - 14. The method of claim 13, wherein b) includes executing the second co-processor thread with two co-processors, wherein a first co-processor computes the probability and a second co-processor determines the most probable state.
  - 15. The method of claim 12, the third co-processor thread loads all network probabilities for given network of speech sub-units for a given sub-unit probability into the local memories of one or more of the co-processor elements.
  - 16. The method of claim 12, wherein d) includes loading all network probabilities for given network of speech sub-units for a given sub-unit probability into a local memory of one or more of the co-processor elements with the third co-processor thread.
  - 17. The method of claim 12, wherein d) includes computing one or more per node per frame portions of the path probability with the third co-processor thread.
  - 18. The method of claim 17, further comprising collecting the per node per frame portions from the third co-processor thread and keeping track of a history of the per-node-per-frame portions and determine the path probability from the history of the per-node-per-frame portions computed by the third co-processor thread with a second main processor thread.
  - 19. The method of claim 12, wherein d) includes computing a complete per-frame path probability with the third co-processor thread.
  - 20. The method of claim 19, further comprising keeping track of a history of the per-frame path probability computed by the third co-processor thread with a second main processor thread.
  - 21. The method of claim 12, wherein a) further comprises receiving the raw data and buffering the raw data for transfer to the first co-processor thread with the first main processor thread.
  - 22. The method of claim 12, wherein a) further comprises finishing feature extraction with the first main processor thread from an initial feature extraction performed by the first co-processor thread.

23. A computer apparatus for implementing computer speech processing in a computer speech processing apparatus having a main processing unit, a memory unit coupled to the main processing unit and one or more co-processors coupled to the memory unit and the main processing unit, wherein each co-processor element includes a co-processor unit and a local memory associated with the co-processor unit, the system comprising:
- a) means for performing memory maintenance and voice recognition result retrievals upon execution with a first main processor thread executed by the main processor;
  
  b) means for performing voice detection and initial feature extraction on the raw data with a first co-processor thread running on one or more of the co-processors;
  
  c) means for receiving feature data derived for one or more features extracted by the first co-processor thread and information for locating probability density functions needed for probability computation by a speech recognition model and computing a probability that the one or more features correspond to a known sub-unit of speech using the probability density functions and the feature data with a second co-processor thread configured to run on one or more of the co-processors; and
  
  d) means for computing at least a portion of a path probability that a sequence of sub-units of speech correspond to a known speech unit with a third co-processor thread configured to run on one or more of the co-processors.

24. A computer readable storage medium, having embodied therein computer readable instructions for implementing a computer speech processing method in a computer speech processing apparatus having a main processing unit, a memory unit coupled to the main processing unit and one or more co-processors coupled to the memory unit and the main processing unit, the method comprising:
- a) performing memory maintenance and voice recognition result retrievals upon execution with a first main processor executed by the main processor;
  
  b) performing voice detection and initial feature extraction on the raw data with a first co-processor thread running on one or more of the co-processors;
  
  c) receiving feature data derived for one or more features extracted by the first co-processor thread and information for locating probability density functions needed for probability computation by a speech recognition model and computing a probability that the one or more features correspond to a known sub-unit of speech using the probability density functions and the feature data with a second co-processor thread configured to run on one or more of the co-processors; and
  
  d) computing at least a portion of a path probability that a sequence of sub-units of speech correspond to a known speech unit with a third co-processor thread configured to run on one or more of the co-processors.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Interactive Entertainment Inc. (Sony Group Corp.)
Original Assignee
Sony Computer Entertainment Incorporated (Sony Group Corp.)
Inventors
Chen, Ruxin
Primary Examiner(s)
Abebe, Daniel D

Application Number

US12/698,955
Publication Number

US 20100211391A1
Time in Patent Office

1,197 Days
Field of Search

704/251, 704/256, 369/25.01
US Class Current

704/256
CPC Class Codes

G10L 15/142 Hidden Markov Models [HMMs]

G10L 15/34 Adaptation of a single reco...

Automatic computation streaming partition for voice recognition on multiple processors with limited memory

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

138 Citations

24 Claims

Specification

Use Cases

Quick Links

Others

Automatic computation streaming partition for voice recognition on multiple processors with limited memory

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

138 Citations

24 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others