Speech recognition with hierarchical networks
First Claim
Patent Images
1. A computer-implemented method for performing speech recognition, comprising:
- as implemented by one or more computing devices configured to execute specific instructions,receiving audio input;
recognizing a first word of an utterance using the audio input, a word network, and a speech unit network, wherein the word network is associated with a language model and the speech unit network is associated with a first speech unit model that comprises pronunciations of words using speech units; and
subsequent to recognizing the first word of the utterance;
determining information about an accent of a speaker of the audio input using the audio input;
obtaining a second speech unit model using the information;
associating the second speech unit model with the speech unit network;
recognizing a second word of the same utterance using the second speech unit model; and
generating speech recognition results comprising the first word, recognized using the first speech unit model, and the second word, recognized using the second speech unit model.
2 Assignments
0 Petitions
Accused Products
Abstract
Provided are systems and methods for using hierarchical networks for recognition, such as speech recognition. Conventional automatic recognition systems may not be both efficient and flexible. Recognition systems are disclosed that may achieve efficiency and flexibility by employing hierarchical networks, prefix consolidation of networks, and future consolidation of networks. The disclosed networks may be associated with a network model and the associated network model may be modified during recognition to achieve greater flexibility.
-
Citations
24 Claims
-
1. A computer-implemented method for performing speech recognition, comprising:
as implemented by one or more computing devices configured to execute specific instructions, receiving audio input; recognizing a first word of an utterance using the audio input, a word network, and a speech unit network, wherein the word network is associated with a language model and the speech unit network is associated with a first speech unit model that comprises pronunciations of words using speech units; and subsequent to recognizing the first word of the utterance; determining information about an accent of a speaker of the audio input using the audio input; obtaining a second speech unit model using the information; associating the second speech unit model with the speech unit network; recognizing a second word of the same utterance using the second speech unit model; and generating speech recognition results comprising the first word, recognized using the first speech unit model, and the second word, recognized using the second speech unit model. - View Dependent Claims (2, 3)
-
4. A computer-implemented method, comprising:
as implemented by one or more computing devices configured to execute specific instructions, receiving input; recognizing a first token of an utterance using the input, a first network, and a second network, wherein the first network is associated with a first model and the second network is associated with a second model; and subsequent to recognizing the first token of the utterance; determining a characteristic using the input; obtaining a substitute model using the characteristic; associating the substitute model with the second network; recognizing a second token of the same utterance using the substitute model; and generating speech recognition results based at least partly on the first token, recognized using the second model, and the second token, recognized using the substitute model. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12)
-
13. A computer readable, non-transitory storage medium having computer executable instructions for performing a method, comprising:
-
receiving audio input; recognizing a first token of an utterance using the audio input, a first network, and a second network, wherein the first network is associated with a first model and the second network is associated with a second model; and subsequent to recognizing the first token; determining a characteristic of a speaker of the audio input using the audio input; obtaining a substitute model using the characteristic; associating the substitute model with the second network; recognizing a second token of the same utterance using the substitute model; and generating speech recognition results based at least partly on the first token, recognized using the second model, and the second token, recognized using the substitute model. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. A system comprising a computing device configured to:
-
receive input; recognize a first token of an utterance using the input, a first network, and a second network, wherein the first network is associated with a first model and the second network is associated with a second model; and subsequent to recognizing the first token of the utterance; determine a characteristic using the input; obtain a substitute model using the characteristic; associate the substitute model with the first network; recognize a second token of the same utterance using the substitute model; and generate speech recognition results based at least partly on the first token, recognized using the first model, and the second token, recognized using the substitute model. - View Dependent Claims (20, 21, 22, 23, 24)
-
Specification