Exemplar-Based Latent Perceptual Modeling for Automatic Speech Recognition
First Claim
1. A method for recognizing speech in an output domain, the method comprising:
- at a device comprising one or more processors and memory;
establishing a global speech recognition model based on an initial set of training data;
receiving a plurality of input speech segments to be recognized in the output domain; and
for each of the plurality of input speech segments;
identifying in the global speech recognition model a respective set of focused training data relevant to the input speech segment;
generating a respective focused speech recognition model based on the respective set of focused training data; and
providing the respective focused speech recognition model to a recognition device for recognizing the input speech segment in the output domain.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, systems, and computer-readable media related to selecting observation-specific training data (also referred to as “observation-specific exemplars”) from a general training corpus, and then creating, from the observation-specific training data, a focused, observation-specific acoustic model for recognizing the observation in an output domain are disclosed. In one aspect, a global speech recognition model is established based on an initial set of training data; a plurality of input speech segments to be recognized in an output domain are received; and for each of the plurality of input speech segments: a respective set of focused training data relevant to the input speech segment is identified in the global speech recognition model; a respective focused speech recognition model is generated based on the respective set of focused training data; and the respective focused speech recognition model is provided to a recognition device for recognizing the input speech segment in the output domain.
93 Citations
30 Claims
-
1. A method for recognizing speech in an output domain, the method comprising:
at a device comprising one or more processors and memory; establishing a global speech recognition model based on an initial set of training data; receiving a plurality of input speech segments to be recognized in the output domain; and for each of the plurality of input speech segments; identifying in the global speech recognition model a respective set of focused training data relevant to the input speech segment; generating a respective focused speech recognition model based on the respective set of focused training data; and providing the respective focused speech recognition model to a recognition device for recognizing the input speech segment in the output domain. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
12. A method for recognizing speech in an output domain, the method comprising:
at a client device comprising one or more processors and memory; receiving a speech input from a user; for each of a plurality of input speech segments in the speech input; receiving a respective focused speech recognition model, wherein the respective focused speech recognition model is generated based on a respective set of focused training data relevant to the input speech segment, wherein the respective set of focused training data is selected for the input speech segment in a global speech recognition model, and wherein the global speech recognition model is generated based on a set of global training data; and recognizing the input speech segment using the respective focused speech recognition model.
-
13. A non-transitory computer-readable medium having instructions stored thereon, the instructions, when executed by one or more processors, cause the processors to perform operations comprising:
-
establishing a global speech recognition model based on an initial set of training data; receiving a plurality of input speech segments to be recognized in an output domain; and for each of the plurality of input speech segments; identifying in the global speech recognition model a respective set of focused training data relevant to the input speech segment; generating a respective focused speech recognition model based on the respective set of focused training data; and providing the respective focused speech recognition model to a recognition device for recognizing the input speech segment in the output domain. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
-
21. A non-transitory computer-readable medium having instructions stored thereon, the instructions, when executed by one or more processors, cause the processors to perform operations comprising:
at a client device; receiving a speech input from a user; for each of a plurality of input speech segments in the speech input; receiving a respective focused speech recognition model, wherein the respective focused speech recognition model is generated based on a respective set of focused training data relevant to the input speech segment, wherein the respective set of focused training data is selected for the input speech segment in a global speech recognition model, and wherein the global speech recognition model is generated based on a set of global training data; and recognizing the input speech segment using the respective focused speech recognition model.
-
22. A system, comprising:
-
one or more processors; and memory having instructions stored thereon, the instructions, when executed by the one or more processors, cause the processors to perform operations comprising; establishing a global speech recognition model based on an initial set of training data; receiving a plurality of input speech segments to be recognized in an output domain; and for each of the plurality of input speech segments; identifying in the global speech recognition model a respective set of focused training data relevant to the input speech segment; generating a respective focused speech recognition model based on the respective set of focused training data; and providing the respective focused speech recognition model to a recognition device for recognizing the input speech segment in the output domain. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29)
-
-
30. A system, comprising:
-
one or more processors; and memory having instructions stored thereon, the instructions, when executed by one or more processors, cause the processors to perform operations comprising; at a client device; receiving a speech input from a user; for each of a plurality of input speech segments in the speech input; receiving a respective focused speech recognition model from a server, wherein the respective focused speech recognition model is generated based on a respective set of focused training data relevant to the input speech segment, wherein the respective set of focused training data is selected for the input speech segment in a global speech recognition model, and wherein the global speech recognition model is generated based on a set of global training data; and recognizing the input speech segment using the respective focused speech recognition model.
-
Specification