Methods and apparatuses for automatic speech recognition
First Claim
1. A method, comprising:
- at an electronic device including at least a processor and a memory;
receiving, by the processor, an input speech signal;
generating first model parameters associated with a first representation of the input speech signal, the first representation being a discrete parameter representation;
generating second model parameters associated with a second representation of the input speech signal, the second representation including a continuous parameter representation of residuals of the input speech signal;
generating third model parameters to couple, in a vector space, the first representation with the second representation, wherein the first, second, and third model parameters are generated based on training at least one of a distortion model and a perception model;
determining, using the vector space, one or more first model parameters that satisfy a predetermined threshold; and
outputting, based on the one or more first model parameters that satisfy the predetermined threshold, a recognized sequence of words.
0 Assignments
0 Petitions
Accused Products
Abstract
Exemplary embodiments of methods and apparatuses for automatic speech recognition are described. First model parameters associated with a first representation of an input signal are generated. The first representation of the input signal is a discrete parameter representation. Second model parameters associated with a second representation of the input signal are generated. The second representation of the input signal includes a continuous parameter representation of residuals of the input signal. The first representation of the input signal includes discrete parameters representing first portions of the input signal. The second representation includes discrete parameters representing second portions of the input signal that are smaller than the first portions. Third model parameters are generated to couple the first representation of the input signal with the second representation of the input signal. The first representation and the second representation of the input signal are mapped into a vector space.
4150 Citations
36 Claims
-
1. A method, comprising:
at an electronic device including at least a processor and a memory; receiving, by the processor, an input speech signal; generating first model parameters associated with a first representation of the input speech signal, the first representation being a discrete parameter representation; generating second model parameters associated with a second representation of the input speech signal, the second representation including a continuous parameter representation of residuals of the input speech signal; generating third model parameters to couple, in a vector space, the first representation with the second representation, wherein the first, second, and third model parameters are generated based on training at least one of a distortion model and a perception model; determining, using the vector space, one or more first model parameters that satisfy a predetermined threshold; and outputting, based on the one or more first model parameters that satisfy the predetermined threshold, a recognized sequence of words. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
13. A non-transitory machine-readable storage medium storing instructions, which when executed by one or more processors of an electronic device, cause the electronic device to:
-
receive, by the one or more processors, an input speech signal; generate first model parameters associated with a first representation of the input speech signal, the first representation being a discrete parameter representation; generate second model parameters associated with a second representation of the input speech signal, the second representation including a continuous parameter representation of residuals of the input speech signal; generate third model parameters to couple, in a vector space, the first representation with the second representation, wherein the first, second, and third model parameters are generated based on training at least one of a distortion model and a perception model; determine, using the vector space, one or more first model that satisfy a predetermined threshold; and output, based on the one or more first model parameters that satisfy the predetermined threshold, a recognized sequence of words. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A device comprising:
-
one or more processors; memory; and one or more programs stored in memory, the one or more programs including instructions for; receiving, by the one or more processors, an input speech signal; generating first model parameters associated with a first representation of the input speech signal, the first representation being a discrete parameter representation; generating second model parameters associated with a second representation of the input speech signal, the second representation including a continuous parameter representation of residuals of the input speech signal; generating third model parameters to couple, in a vector space, the first representation with the second representation; determining, using the vector space, one or more first model parameters that satisfy a predetermined threshold, wherein the first, second, and third model parameters are generated based on training at least one of a distortion model and a perception model; and outputting, based on the one or more first model parameters that satisfy the predetermined threshold, a recognized sequence of words. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
Specification