Speech recognition with hierarchical networks
First Claim
Patent Images
1. A computer-implemented method for performing speech recognition, comprising:
- selecting, via at least one computer processor configured to execute specific instructions, a first set of word candidates from a plurality of word tokens, wherein the plurality of word tokens are associated with a language model and a word network of a hierarchy of networks;
selecting, via the at least one computer processor, a first set of speech unit candidates from a plurality of speech unit tokens, wherein the plurality of speech unit tokens are associated with a speech unit model and a speech unit network of the hierarchy of networks, wherein a word token of the plurality of word tokens corresponds to one or more speech tokens of the plurality of speech tokens;
receiving, via the at least one computer processor, audio input, wherein the audio input was captured via a microphone;
selecting, via the at least one computer processor, a second set of speech unit candidates from the plurality of speech unit tokens using the audio input and the first set of speech unit candidates;
recognizing, via the at least one computer processor, a word candidate in the first set of word candidates based at least partly on a correspondence of the word candidate to one or more speech unit candidates of the second set of speech unit candidates; and
selecting, via the at least one computer processor, a second set of word candidates from the plurality of word tokens based at least partly on recognition of the word candidate.
2 Assignments
0 Petitions
Accused Products
Abstract
Provided are systems and methods for using hierarchical networks for recognition, such as speech recognition. Conventional automatic recognition systems may not be both efficient and flexible. Recognition systems are disclosed that may achieve efficiency and flexibility by employing hierarchical networks, prefix consolidation of networks, and future consolidation of networks. The disclosed networks may be associated with a network model and the associated network model may be modified during recognition to achieve greater flexibility.
-
Citations
28 Claims
-
1. A computer-implemented method for performing speech recognition, comprising:
-
selecting, via at least one computer processor configured to execute specific instructions, a first set of word candidates from a plurality of word tokens, wherein the plurality of word tokens are associated with a language model and a word network of a hierarchy of networks; selecting, via the at least one computer processor, a first set of speech unit candidates from a plurality of speech unit tokens, wherein the plurality of speech unit tokens are associated with a speech unit model and a speech unit network of the hierarchy of networks, wherein a word token of the plurality of word tokens corresponds to one or more speech tokens of the plurality of speech tokens; receiving, via the at least one computer processor, audio input, wherein the audio input was captured via a microphone; selecting, via the at least one computer processor, a second set of speech unit candidates from the plurality of speech unit tokens using the audio input and the first set of speech unit candidates; recognizing, via the at least one computer processor, a word candidate in the first set of word candidates based at least partly on a correspondence of the word candidate to one or more speech unit candidates of the second set of speech unit candidates; and selecting, via the at least one computer processor, a second set of word candidates from the plurality of word tokens based at least partly on recognition of the word candidate. - View Dependent Claims (2, 3, 25)
-
-
4. A computer-implemented method comprising:
-
selecting, via at least one computer processor configured to execute specific instructions, a first set of first-level candidates from a plurality of first-level tokens, wherein the plurality of first-level tokens are associated with a first network of a hierarchy of networks; selecting, via the at least one computer processor, a first set of second-level candidates from a plurality of second-level tokens, wherein the plurality of second-level tokens are associated with a second network of the hierarchy of networks, wherein a first-level token of the plurality of first-level tokens corresponds to one or more second-level tokens of the plurality of second-level tokens; receiving input via the at least one computer processor, wherein the input was captured via a microphone; selecting, via the at least one computer processor, a second set of second-level candidates from the plurality of second-level tokens using the input and the first set of second-level candidates; recognizing, via the at least one computer processor, a candidate in the first set of first-level candidates using the second set of second-level candidates; and selecting, via the at least one computer processor, a second set of first-level candidates from the plurality of first-level tokens using the candidate. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 26)
-
-
12. A computer readable, non-transitory storage medium storing computer executable instructions that, when executed by one or more computing systems, configure the one or more computer systems to collectively perform operations comprising:
-
selecting, via at least one computer processor configured to execute specific instructions, a first set of first-level candidates from a plurality of first-level tokens, wherein the plurality of first-level tokens are associated with a first network of a hierarchy of networks; selecting, via the at least one computer processor, a first set of second-level candidates from a plurality of second-level tokens, wherein the plurality of second-level tokens are associated with a second network of the hierarchy of networks, wherein a first-level token of the plurality of first-level tokens corresponds to one or more second-level tokens of the plurality of second-level tokens; receiving input via the at least one computer processor, wherein the input was captured via a microphone; selecting, via the at least one computer processor, a second set of second-level candidates from the plurality of second-level tokens using the input and the first set of second-level candidates; recognizing, via the at least one computer processor, a candidate in the first set of first-level candidates using the second set of second-level candidates; and selecting, via the at least one computer processor, a second set of first-level candidates from the plurality of first-level tokens using the candidate. - View Dependent Claims (13, 14, 15, 16, 17, 18, 27)
-
-
19. A system comprising one or more computing devices, the system configured to:
-
select, via at least one computer processor configured to execute specific instructions, a first set of first-level candidates from a plurality of first-level tokens, wherein the plurality of first-level tokens are associated with a first network of a hierarchy of networks; select, via the at least one computer processor, a first set of second-level candidates from a plurality of second-level tokens, wherein the plurality of second-level tokens are associated with a second network of the hierarchy of networks, wherein a first-level token of the plurality of first-level tokens corresponds to one or more second-level tokens of the plurality of second-level tokens; receiving input via the at least one computer processor, wherein the input was captured via a microphone; select, via the at least one computer processor, a second set of first-level candidates from the plurality of first-level tokens using the input and the first set of second-level candidates; recognize, via the at least one computer processor, a candidate in the first set of second-level candidates using the second set of first-level candidates; and select, via the at least one computer processor, a second set of second-level candidates from the plurality of second-level tokens using the candidate. - View Dependent Claims (20, 21, 22, 23, 24, 28)
-
Specification