Speech recognition using channel verification
First Claim
Patent Images
1. A method comprising:
- receiving an input signal;
computing a plurality of scores for the input signal, the plurality of scores indicative of a degree to which the input signal corresponds to at least one state of a speech recognition model;
computing an average signal based on the speech recognition model and the plurality of scores;
computing, via at least one processor device, a difference value representative of a difference between the input signal and the average signal;
processing, via the at least one processor device, the input signal in accordance with the difference value;
wherein the scores are probability scores associated with states of the input signal and wherein the average signal is a moving average generated based on the input signal, the method further comprising;
biasing scores of noise states associated with the speech recognition model to increase a probability that a segment of the input signal is deemed to be sound received on a noise channel rather than sound received on a speech channel, the average signal representative of sound present in the input signal.
7 Assignments
0 Petitions
Accused Products
Abstract
A method for automatic speech recognition includes determining for an input signal a plurality scores representative of certainties that the input signal is associated with corresponding states of a speech recognition model, using the speech recognition model and the determined scores to compute an average signal, computing a difference value representative of a difference between the input signal and the average signal, and processing the input signal in accordance with the difference value.
-
Citations
18 Claims
-
1. A method comprising:
-
receiving an input signal; computing a plurality of scores for the input signal, the plurality of scores indicative of a degree to which the input signal corresponds to at least one state of a speech recognition model; computing an average signal based on the speech recognition model and the plurality of scores; computing, via at least one processor device, a difference value representative of a difference between the input signal and the average signal; processing, via the at least one processor device, the input signal in accordance with the difference value; wherein the scores are probability scores associated with states of the input signal and wherein the average signal is a moving average generated based on the input signal, the method further comprising; biasing scores of noise states associated with the speech recognition model to increase a probability that a segment of the input signal is deemed to be sound received on a noise channel rather than sound received on a speech channel, the average signal representative of sound present in the input signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A non-transitory machine readable storage medium storing computer instructions that, when executed, cause a processor-based machine to:
-
receive an input signal; apply to the input signal a cepstrum transformation; generate a mel-frequency cepstral representation comprising mel-frequency coefficients from the cepstrum transform representation of the input signal; determine for the input signal a plurality of scores representative of certainties that the input signal corresponds to states of a speech recognition model; use the speech recognition model and the determined scores to compute an average signal, the average signal representing an estimate of sound that has passed through a speech channel; compute a difference value representative of a difference between the input signal and the average signal, a magnitude of the difference value indicative of whether the input signal is speech versus noise; and process the input signal as speech versus noise depending on a magnitude of the difference value; adjust the plurality of scores based on the difference value; use the adjusted scores to update the speech recognition model; wherein the average signal is represented using the mel-frequency cepstral representation; and wherein the instructions that that cause the processor-based machine to compute the difference value comprise instructions that when executed cause the processor-based machine to;
compute a channel deviation value based on a difference between a 0th dimension of a mel-frequency cepstral coefficient of the mel-cepstral frequency representation of the input signal and a 0th dimension mel-frequency cepstral representation of the average signal. - View Dependent Claims (14, 15)
-
-
16. A method comprising:
-
receiving an input signal; segmenting the input signal into multiple frames; computing a set of coefficients for a given frame of the multiple frames, the set of coefficients representing a cepstral vector; normalizing the set of coefficients associated with the given frame; based on using the normalized set of coefficients as inputs, computing an average signal for the given frame, the average signal representative of sound in the input signal; computing, via at least one processor device, a difference value representative of a difference between the input signal and the average signal; and processing, via the at least one processor device, the input signal in accordance with the difference value; wherein computing the average signal for the given frame further comprises; presenting the normalized set of coefficients for the given frame of the input signal to a speech recognition decoder; via the speech recognition decoder, computing a plurality of scores for the given frame using the normalized set of coefficients, the plurality of scores representative of probabilities that the cepstral vector associated with the given frame of the input signal corresponds to states in a speech recognition model; identifying, for the given frame, a particular score from the plurality of scores; selecting, from the plurality of scores associated the given frame, a particular set of scores whose corresponding values are within a predetermined threshold from a value of the particular score; performing an averaging operation on observation mean vectors associated with the selected plurality of scores to produce the average signal, the average signal representing an estimate of sound in the input signal that has passed through a speech channel as opposed to a noise channel; wherein a magnitude of the difference value indicates whether sound present in the input signal is speech versus noise; utilizing a magnitude of the difference value to determine whether the given frame of the input signal represents speech versus noise; and processing the input signal as noise, as opposed to speech, in response to detecting that the magnitude of the difference value is above a threshold value. - View Dependent Claims (17, 18)
-
Specification