Speech recognition with hierarchical networks

US 9,093,061 B1
Filed: 03/29/2012
Issued: 07/28/2015
Est. Priority Date: 04/14/2011
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for performing speech recognition, comprising:

selecting, via at least one computer processor configured to execute specific instructions, a first set of word candidates from a plurality of word tokens, wherein the plurality of word tokens are associated with a language model and a word network of a hierarchy of networks;

selecting, via the at least one computer processor, a first set of speech unit candidates from a plurality of speech unit tokens, wherein the plurality of speech unit tokens are associated with a speech unit model and a speech unit network of the hierarchy of networks, wherein a word token of the plurality of word tokens corresponds to one or more speech tokens of the plurality of speech tokens;

receiving, via the at least one computer processor, audio input, wherein the audio input was captured via a microphone;

selecting, via the at least one computer processor, a second set of speech unit candidates from the plurality of speech unit tokens using the audio input and the first set of speech unit candidates;

recognizing, via the at least one computer processor, a word candidate in the first set of word candidates based at least partly on a correspondence of the word candidate to one or more speech unit candidates of the second set of speech unit candidates; and

selecting, via the at least one computer processor, a second set of word candidates from the plurality of word tokens based at least partly on recognition of the word candidate.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Provided are systems and methods for using hierarchical networks for recognition, such as speech recognition. Conventional automatic recognition systems may not be both efficient and flexible. Recognition systems are disclosed that may achieve efficiency and flexibility by employing hierarchical networks, prefix consolidation of networks, and future consolidation of networks. The disclosed networks may be associated with a network model and the associated network model may be modified during recognition to achieve greater flexibility.

Citations

28 Claims

1. A computer-implemented method for performing speech recognition, comprising:
- selecting, via at least one computer processor configured to execute specific instructions, a first set of word candidates from a plurality of word tokens, wherein the plurality of word tokens are associated with a language model and a word network of a hierarchy of networks;
  
  selecting, via the at least one computer processor, a first set of speech unit candidates from a plurality of speech unit tokens, wherein the plurality of speech unit tokens are associated with a speech unit model and a speech unit network of the hierarchy of networks, wherein a word token of the plurality of word tokens corresponds to one or more speech tokens of the plurality of speech tokens;
  
  receiving, via the at least one computer processor, audio input, wherein the audio input was captured via a microphone;
  
  selecting, via the at least one computer processor, a second set of speech unit candidates from the plurality of speech unit tokens using the audio input and the first set of speech unit candidates;
  
  recognizing, via the at least one computer processor, a word candidate in the first set of word candidates based at least partly on a correspondence of the word candidate to one or more speech unit candidates of the second set of speech unit candidates; and
  
  selecting, via the at least one computer processor, a second set of word candidates from the plurality of word tokens based at least partly on recognition of the word candidate.
- View Dependent Claims (2, 3, 25)
- - 2. The computer-implemented method of claim 1, wherein the plurality of word tokens comprises words of the English language and the language model comprises a trigram language model.
  - 3. The computer-implemented method of claim 1, wherein the plurality of speech unit tokens comprises phonemes and the speech unit model comprises a lexicon indicating the pronunciation of words using the phonemes.
  - 25. The computer-implemented method of claim 1, further comprising adding, via the at least one computer processor, at least one speech unit candidate to the speech unit network based at least partly on the second set of word candidates.

4. A computer-implemented method comprising:
- selecting, via at least one computer processor configured to execute specific instructions, a first set of first-level candidates from a plurality of first-level tokens, wherein the plurality of first-level tokens are associated with a first network of a hierarchy of networks;
  
  selecting, via the at least one computer processor, a first set of second-level candidates from a plurality of second-level tokens, wherein the plurality of second-level tokens are associated with a second network of the hierarchy of networks, wherein a first-level token of the plurality of first-level tokens corresponds to one or more second-level tokens of the plurality of second-level tokens;
  
  receiving input via the at least one computer processor, wherein the input was captured via a microphone;
  
  selecting, via the at least one computer processor, a second set of second-level candidates from the plurality of second-level tokens using the input and the first set of second-level candidates;
  
  recognizing, via the at least one computer processor, a candidate in the first set of first-level candidates using the second set of second-level candidates; and
  
  selecting, via the at least one computer processor, a second set of first-level candidates from the plurality of first-level tokens using the candidate.
- View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 26)
- - 5. The computer-implemented method of claim 4, wherein the plurality of first-level tokens comprises words and the plurality of second-level tokens comprises speech units.
  - 6. The computer-implemented method of claim 4, further comprising:
    - prefix consolidating the first set of second-level candidates.
  - 7. The computer-implemented method of claim 6, wherein prefix consolidating the first set of second-level candidates comprises merging all duplicates among the first set of second-level candidates.
  - 8. The computer-implemented method of claim 4, wherein the plurality of first-level tokens is associated with a first network model.
  - 9. The computer-implemented method of claim 8, wherein the plurality of first-level tokens comprises speech units in context and the first network model comprises a mapping of speech units to the speech units in context.
  - 10. The computer-implemented method of claim 4, further comprising:
    - creating a first result network using the first set of first-level candidates; and
      
      creating a second result network using the first set of second-level candidates.
  - 11. The computer-implemented method of claim 10, wherein creating a first result network using the first set of first-level candidates comprises creating a graph with a node and a plurality of arcs leaving the node, wherein each of the plurality of arcs is associated with one candidate from the first set of first-level candidates.
  - 26. The computer-implemented method of claim 4, wherein recognizing the candidate in the first set of first-level candidates using the second set of second-level candidates comprises recognizing the candidate based at least partly on a correspondence of the candidate to one or more second-level candidates of the second set of second-level candidates.

12. A computer readable, non-transitory storage medium storing computer executable instructions that, when executed by one or more computing systems, configure the one or more computer systems to collectively perform operations comprising:
- selecting, via at least one computer processor configured to execute specific instructions, a first set of first-level candidates from a plurality of first-level tokens, wherein the plurality of first-level tokens are associated with a first network of a hierarchy of networks;
  
  selecting, via the at least one computer processor, a first set of second-level candidates from a plurality of second-level tokens, wherein the plurality of second-level tokens are associated with a second network of the hierarchy of networks, wherein a first-level token of the plurality of first-level tokens corresponds to one or more second-level tokens of the plurality of second-level tokens;
  
  receiving input via the at least one computer processor, wherein the input was captured via a microphone;
  
  selecting, via the at least one computer processor, a second set of second-level candidates from the plurality of second-level tokens using the input and the first set of second-level candidates;
  
  recognizing, via the at least one computer processor, a candidate in the first set of first-level candidates using the second set of second-level candidates; and
  
  selecting, via the at least one computer processor, a second set of first-level candidates from the plurality of first-level tokens using the candidate.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 27)
- - 13. The computer readable, non-transitory storage medium of claim 12, the operations further comprising:
    - selecting a first set of third-level candidates from a plurality of third-level tokens.
  - 14. The computer readable, non-transitory storage medium of claim 12, the operations further comprising:
    - creating a first result network using the first set of first-level candidates.
  - 15. The computer readable, non-transitory storage medium of claim 14, wherein creating a first result network using the first set of first-level candidates comprises creating a directed graph with a node and a plurality of arcs leaving the node, wherein each of the plurality of arcs is associated with one candidate from the first set of first-level candidates.
  - 16. The computer readable, non-transitory storage medium of claim 12, wherein each candidate of the second set of second-level candidates is associated with a probability.
  - 17. The computer readable, non-transitory storage medium of claim 12, the operations further comprising:
    - future consolidating the second set of first-level candidates.
  - 18. The computer readable, non-transitory storage medium of claim 12, the operations further comprising:
    - creating a first result network using the first set of first-level candidates and the second set of first-level candidates, wherein the first result network is a graph and wherein each of the first set of first-level candidates and each of the second set of first-level candidates is associated with an arc of the graph; and
      
      wherein future consolidating the second set of first-level candidates comprises merging a first arc and a second arc that have the same end time and correspond to the same state of an associated first-level model.
  - 27. The computer readable, non-transitory storage medium of claim 12, the operations further comprising adding, via the at least one computer processor, at least one second-level candidate to the second-level network based at least partly on the second set of first-level candidates.

19. A system comprising one or more computing devices, the system configured to:
- select, via at least one computer processor configured to execute specific instructions, a first set of first-level candidates from a plurality of first-level tokens, wherein the plurality of first-level tokens are associated with a first network of a hierarchy of networks;
  
  select, via the at least one computer processor, a first set of second-level candidates from a plurality of second-level tokens, wherein the plurality of second-level tokens are associated with a second network of the hierarchy of networks, wherein a first-level token of the plurality of first-level tokens corresponds to one or more second-level tokens of the plurality of second-level tokens;
  
  receiving input via the at least one computer processor, wherein the input was captured via a microphone;
  
  select, via the at least one computer processor, a second set of first-level candidates from the plurality of first-level tokens using the input and the first set of second-level candidates;
  
  recognize, via the at least one computer processor, a candidate in the first set of second-level candidates using the second set of first-level candidates; and
  
  select, via the at least one computer processor, a second set of second-level candidates from the plurality of second-level tokens using the candidate.
- View Dependent Claims (20, 21, 22, 23, 24, 28)
- - 20. The system of claim 19 further configured to:
    - prefix consolidate the first set of first-level candidates;
      
      future consolidate the first set of first-level candidates;
      
      prefix consolidate the first set of second-level candidates;
      
      future consolidate the first set of second-level candidates;
      
      prefix consolidate the second set of second-level candidates;
      
      future consolidate the second set of second-level candidates;
      
      prefix consolidate the second set of first-level candidates; and
      
      future consolidate the second set of first-level candidates.
  - 21. The system of claim 19 further configured to:
    - create a first result network using the first set of first-level candidates; and
      
      create a second result network using the first set of second-level candidates.
  - 22. The system of claim 19, wherein the plurality of first-level tokens comprises phones-in-context and the plurality of second-level tokens comprises hidden Markov model states.
  - 23. The system of claim 19, wherein the plurality of first-level tokens is associated with a first network model, and the plurality of second-level tokens is associated with a second network model.
  - 24. The system of claim 19 further configured to:
    - select the second set of first-level candidates from the plurality of first-level tokens using the first network model.
  - 28. The system of claim 19, wherein recognizing the candidate in the first set of first-level candidates using the second set of second-level candidates comprises recognizing the candidate based at least partly on a correspondence of the candidate to one or more second-level candidates of the second set of second-level candidates.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Canyon IP Holdings LLC (Intellectual Ventures LLC)
Inventors
Secker-Walker, Hugh, Basye, Kenneth J., Krishnamoorthy, Mahesh
Primary Examiner(s)
Chawan, Vijay B

Application Number

US13/434,315
Time in Patent Office

1,216 Days
Field of Search

704/231, 704/251, 704/256, 704/240, 704/254, 704/257, 704/243, 704/256.2, 704/232, 704/200.1
US Class Current

1/1
CPC Class Codes

G10L 15/00   Speech recognition G10L17/0...

G10L 15/005   Language recognition

G10L 15/063   Training

G10L 15/083   Recognition networks G10L15...

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/193   Formal grammars, e.g. finit...

Speech recognition with hierarchical networks

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition with hierarchical networks

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links