Hardware-implemented scalable modular engine for low-power speech recognition

US 8,463,610 B1
Filed: 01/19/2009
Issued: 06/11/2013
Est. Priority Date: 01/18/2008
Status: Expired due to Fees

First Claim

Patent Images

1. An Application Specific Integration Circuit (ASIC) for use in a hardware-implemented backend search engine for a low-power speech recognition system, said ASIC comprising at least:

a scoring engine that includes logic circuitry adapted to read a plurality of active acoustic unit models from external memory, update each of the plurality of active acoustic unit models based on one or more corresponding senone scores received from an acoustic scoring engine for a current frame of sampled speech, write the plurality of active acoustic unit models back to the external memory, and enter a low-power state after writing the plurality of active acoustic unit models back to the external memory until processing for a subsequent frame of sampled speech is to begin;

a transition engine that includes logic circuitry adapted to process the plurality of active acoustic unit models after the plurality of active acoustic unit models have been updated and written back to the external memory by the scoring engine in order to prune unlikely active acoustic unit models for the current frame of sampled speech, create or modify active acoustic unit models likely to be transitioned to, and identify any completed words, wherein the transition engine is in a low-power state while the scoring engine is processing the plurality of active acoustic unit models; and

a language model engine that includes logic circuitry adapted to process any completed words identified by the transition engine for the current frame of sampled speech to identify one or more words that are likely to follow in the subsequent frame of sampled speech.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention relates to a low-power speech recognition system. In one embodiment, the speech recognition system is implemented in hardware and includes a backend search engine that operates to recognize words based on senone scores provided by an acoustic scoring stage. The backend search engine includes a scoring engine, a transition engine, and a language model engine. For a frame of sampled speech, the scoring engine reads active acoustic unit models from external memory, updates the active acoustic unit models based on corresponding senone scores received from an acoustic scoring stage, and writes the active acoustic unit models back to the external memory. The scoring engine enters a low-power state until processing for a next frame of sampled speech is to begin. The transition stage identifies any completed words, and the language model engine processes completed words to identify words that are likely to follow in a subsequent frame.

55 Citations

View as Search Results

20 Claims

1. An Application Specific Integration Circuit (ASIC) for use in a hardware-implemented backend search engine for a low-power speech recognition system, said ASIC comprising at least:
- a scoring engine that includes logic circuitry adapted to read a plurality of active acoustic unit models from external memory, update each of the plurality of active acoustic unit models based on one or more corresponding senone scores received from an acoustic scoring engine for a current frame of sampled speech, write the plurality of active acoustic unit models back to the external memory, and enter a low-power state after writing the plurality of active acoustic unit models back to the external memory until processing for a subsequent frame of sampled speech is to begin;
  
  a transition engine that includes logic circuitry adapted to process the plurality of active acoustic unit models after the plurality of active acoustic unit models have been updated and written back to the external memory by the scoring engine in order to prune unlikely active acoustic unit models for the current frame of sampled speech, create or modify active acoustic unit models likely to be transitioned to, and identify any completed words, wherein the transition engine is in a low-power state while the scoring engine is processing the plurality of active acoustic unit models; and
  
  a language model engine that includes logic circuitry adapted to process any completed words identified by the transition engine for the current frame of sampled speech to identify one or more words that are likely to follow in the subsequent frame of sampled speech.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The ASIC of claim 1 wherein the language model engine operates in parallel with at least one of the scoring engine and the transition engine.
  - 3. The ASIC of claim 1 wherein for each completed word, the language model engine is adapted to identify one or more expected words that are likely to follow the completed word using an n-gram analysis, wherein as part of the n-gram analysis the language model engine performs a lookup for n-grams for the completed word in the external memory using a hashing technique.
  - 4. The ASIC of claim 1 wherein for each completed word, the language model engine is adapted to identify one or more expected words that are likely to follow the completed word using an n-gram analysis, wherein as part of the n-gram analysis the language model engine performs a lookup for trigrams for the completed word in the external memory using a hashing technique.
  - 5. The ASIC of claim 4 wherein trigrams in a language model used for speech recognition are stored in a single hash table in the external memory and lookup for the trigrams is performed using a single hash function that computes a hash value for the completed word and a preceding word in a word history of the completed word.
  - 6. The ASIC of claim 4 wherein trigrams in a language model used for speech recognition are stored in a plurality of hash tables in the external memory, each of the plurality of hash tables including a plurality of bins per hash value, and lookup for the trigrams for the completed word comprises:
    - for each hash table of the plurality of hash tables, computing a hash value for the hash table based on the completed word and a preceding word in a word history of the completed word using a corresponding hash function; and
      
      finding the trigrams for the completed word in the plurality of hash tables using the hash values computed for the plurality of hash tables.
  - 7. The ASIC of claim 6 wherein the plurality of hash tables is two hash tables, each having two bins per hash value.
  - 8. The ASIC of claim 6 wherein the trigrams in the language model are stored in the plurality of hash tables according to a cuckoo hashing scheme such that trigrams for a single word pair are stored in one bin for one hash value in one of the plurality of hash tables.
  - 9. The ASIC of claim 6 wherein each of the hash functions for the plurality of hash tables is a Cyclic Redundancy Check (CRC) function.
  - 10. The ASIC of claim 9 wherein the language model engine stores one or more lookup tables in internal memory to assist in computations required for the CRC function for each of the hash functions of the plurality of hash tables.
  - 11. The ASIC of claim 10 wherein the one or more lookup tables for each of the hash functions of the plurality of hash tables are configurable in order to accommodate at least one of a group consisting of:
    - different languages, different language models having different vocabulary sizes, and hash tables of different sizes for a particular language model.
  - 12. The ASIC of claim 1 wherein the scoring engine comprises a single scoring pipeline for scoring both within-word active acoustic unit models and cross-word active acoustic unit models.

13. An Application Specific Integration Circuit (ASIC) for use in a hardware-implemented backend search engine for a low-power speech recognition system, said ASIC comprising at least:
- a scoring engine that includes logic circuitry adapted to read a plurality of active acoustic unit models from external memory, update each of the plurality of active acoustic unit models based on one or more corresponding senone scores received from an acoustic scoring engine for a current frame of sampled speech, and write the plurality of active acoustic unit models back to the external memory;
  
  a transition engine that includes logic circuitry adapted to process the plurality of active acoustic unit models after the plurality of active acoustic unit models have been updated and written back to the external memory by the scoring engine in order to prune unlikely active acoustic unit models for the current frame of sampled speech, create or modify active acoustic unit models likely to be transitioned to, and identify any completed words, wherein the transition engine is in a low-power state while the scoring engine is processing the plurality of active acoustic unit models; and
  
  a language model engine that includes logic circuitry adapted to, for each completed word, identify one or more expected words that are likely to follow the completed word using an n-gram analysis, wherein as part of the n-gram analysis the language model engine performs a lookup for trigrams for the completed word in the external memory using a hashing technique.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The ASIC of claim 13 wherein trigrams in a language model used for speech recognition are stored in a single hash table in the external memory and lookup for the trigrams is performed using a single hash function that computes a hash value for the completed word and a preceding word in a word history of the completed word.
  - 15. The ASIC of claim 13 wherein trigrams in a language model used for speech recognition are stored in a plurality of hash tables in the external memory, each of the plurality of hash tables including a plurality of bins per hash value, and lookup for the trigrams for the completed word comprises:
    - for each hash table of the plurality of hash tables, computing a hash value for the hash table based on the completed word and a preceding word in a word history of the completed word using a corresponding hash function; and
      
      finding the trigrams for the completed word in the plurality of hash tables using the hash values computed for the plurality of hash tables.
  - 16. The ASIC of claim 15 wherein the plurality of hash tables is two hash tables, each having two bins per hash value.
  - 17. The ASIC of claim 15 wherein the trigrams in the language model are stored in the plurality of hash tables according to a cuckoo hashing scheme such that trigrams for a single word pair are stored in one bin for one hash value in one of the plurality of hash tables.
  - 18. The ASIC of claim 15 wherein each of the hash functions for the plurality of hash tables is a Cyclic Redundancy Check (CRC) function.
  - 19. The ASIC of claim 18 wherein the language model engine stores one or more lookup tables in internal memory to assist in computations required for the CRC function for each of the hash functions of the plurality of hash tables.
  - 20. The ASIC of claim 19 wherein the one or more lookup tables for each of the hash functions of the plurality of hash tables are adaptable in order to accommodate at least one of a group consisting of:
    - different languages, different language models having different vocabulary sizes, and hash tables of different sizes for a particular language model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Carnegie Mellon University
Original Assignee
Carnegie Mellon University
Inventors
Rutenbar, Rob A., Bourke, Patrick J.
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Serrou, Abdelali

Application Number

US12/355,973
Time in Patent Office

1,604 Days
Field of Search

704/255, 704/257, 704/231, 704/235, 704/256, 704/270, 704/271
US Class Current

704/257
CPC Class Codes

G06N 7/01   Probabilistic graphical mod...

G10L 15/197   Probabilistic grammars, e.g...

G10L 15/285   Memory allocation or algori...

Hardware-implemented scalable modular engine for low-power speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

55 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Hardware-implemented scalable modular engine for low-power speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

55 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others