Methods and apparatuses for automatic speech recognition
First Claim
1. A machine implemented method to perform speech recognition, comprising:
- receiving first portions of an acoustic signal;
determining a likelihood of a recovered first parameter sequence representing the first portions of the acoustic signal;
determining a likelihood of a recovered second parameter sequence associated with the recovered first parameter sequence, wherein the second parameter sequence represents second portions of the acoustic signal having a coarser granularity than the first portions;
determining a likelihood of a recovered word sequence based on the recovered second parameter sequence; and
outputting the recovered word sequence.
1 Assignment
0 Petitions
Accused Products
Abstract
Exemplary embodiments of methods and apparatuses for automatic speech recognition are described. First model parameters associated with a first representation of an input signal are generated. The first representation of the input signal is a discrete parameter representation. Second model parameters associated with a second representation of the input signal are generated. The second representation of the input signal includes a continuous parameter representation of residuals of the input signal. The first representation of the input signal includes discrete parameters representing first portions of the input signal. The second representation includes discrete parameters representing second portions of the input signal that are smaller than the first portions. Third model parameters are generated to couple the first representation of the input signal with the second representation of the input signal. The first representation and the second representation of the input signal are mapped into a vector space.
883 Citations
33 Claims
-
1. A machine implemented method to perform speech recognition, comprising:
-
receiving first portions of an acoustic signal; determining a likelihood of a recovered first parameter sequence representing the first portions of the acoustic signal; determining a likelihood of a recovered second parameter sequence associated with the recovered first parameter sequence, wherein the second parameter sequence represents second portions of the acoustic signal having a coarser granularity than the first portions; determining a likelihood of a recovered word sequence based on the recovered second parameter sequence; and outputting the recovered word sequence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A non-transitory machine-readable medium storing executable program instructions which when executed by a data processing system causes the system to perform operations to recognize speech, comprising:
-
receiving first portions of an acoustic signal; determining a likelihood of a recovered first parameter sequence representing the first portions of the acoustic signal; determining a likelihood of a recovered second parameter sequence associated with the recovered first parameter sequence, wherein the second parameter sequence represents second portions of the acoustic signal that have a coarser granularity than the first portions; determining a likelihood of a recovered word sequence based on the recovered second parameter sequence; and outputting the recovered word sequence. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A data processing system to perform speech recognition, comprising:
-
a memory; and a processor coupled to the memory, the processor is configured to; receive first portions of an acoustic signal; determine a likelihood of a recovered first parameter sequence representing the first portions of the acoustic signal; determine a likelihood of a recovered second parameter sequence associated with the recovered first parameter sequence, wherein the second parameter sequence represents second portions of the acoustic signal that have a coarser granularity than the first portions; determine a likelihood of a recovered word sequence based on the recovered second parameter sequence; and output the recovered word sequence. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. A data processing system to perform speech recognition, comprising:
- means for receiving first portions of an acoustic signal;
means for determining a likelihood of a recovered first parameter sequence representing the first portions of the acoustic signal;
means for determining a likelihood of a recovered second parameter sequence associated with the recovered first parameter sequence, wherein the second parameter sequence represents second portions of the acoustic signal that have a coarser granularity than the first portions;
means for determining a likelihood of a recovered word sequence based on the recovered second parameter sequence; and
means for outputting the recovered word sequence.
- means for receiving first portions of an acoustic signal;
-
32. A non-transitory machine readable storage medium containing executable instructions which when executed cause a data processing system to perform a speech recognition method, the method comprising:
-
receiving an acoustic signal; extracting features from a digitized representation of the acoustic signal; comparing at least some of the features to a first component of an acoustic model, the first component having a discrete parameter representation; comparing at least some of the features to a second component of the acoustic model, the second component having a continuous parameter representation which models residuals of speech signals; determining a recognized word from the comparing of at least some of the features to the first and the second components, wherein the discrete parameter representation and the continuous parameter representation are both used to map the features to at least one cluster label which is used to determine at least one phoneme. - View Dependent Claims (33)
-
Specification