Speech recognition using channel verification

US 8,346,554 B2
Filed: 09/15/2010
Issued: 01/01/2013
Est. Priority Date: 03/31/2006
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving an input signal;

computing a plurality of scores for the input signal, the plurality of scores indicative of a degree to which the input signal corresponds to at least one state of a speech recognition model;

computing an average signal based on the speech recognition model and the plurality of scores;

computing, via at least one processor device, a difference value representative of a difference between the input signal and the average signal;

processing, via the at least one processor device, the input signal in accordance with the difference value;

wherein the scores are probability scores associated with states of the input signal and wherein the average signal is a moving average generated based on the input signal, the method further comprising;

biasing scores of noise states associated with the speech recognition model to increase a probability that a segment of the input signal is deemed to be sound received on a noise channel rather than sound received on a speech channel, the average signal representative of sound present in the input signal.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for automatic speech recognition includes determining for an input signal a plurality scores representative of certainties that the input signal is associated with corresponding states of a speech recognition model, using the speech recognition model and the determined scores to compute an average signal, computing a difference value representative of a difference between the input signal and the average signal, and processing the input signal in accordance with the difference value.

Citations

18 Claims

1. A method comprising:
- receiving an input signal;
  
  computing a plurality of scores for the input signal, the plurality of scores indicative of a degree to which the input signal corresponds to at least one state of a speech recognition model;
  
  computing an average signal based on the speech recognition model and the plurality of scores;
  
  computing, via at least one processor device, a difference value representative of a difference between the input signal and the average signal;
  
  processing, via the at least one processor device, the input signal in accordance with the difference value;
  
  wherein the scores are probability scores associated with states of the input signal and wherein the average signal is a moving average generated based on the input signal, the method further comprising;
  
  biasing scores of noise states associated with the speech recognition model to increase a probability that a segment of the input signal is deemed to be sound received on a noise channel rather than sound received on a speech channel, the average signal representative of sound present in the input signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein processing the input signal in accordance with the difference value comprises:
    - performing speech recognition using the input signal if the difference value does not exceed a first predetermined threshold.
  - 3. The method of claim 1, wherein processing the input signal in accordance with the difference value comprises:
    - adjusting the plurality of scores based on the difference value; and
      
      using the adjusted scores to update speech recognition of the input signal.
  - 4. The method of claim 3, further comprising:
    - applying, to the input signal, a cepstrum transformation; and
      
      generating a mel-frequency cepstral representation comprising mel-frequency coefficients from the cepstrum transform representation of the input signal.
  - 5. The method of claim 4, wherein the average signal is represented using a corresponding mel-frequency cepstral representation, and wherein computing the difference value comprises:
    - computing a channel deviation value based on a difference between a 0^thdimension of a mel-frequency cepstral coefficient of the mel-cepstral frequency representation of the input signal and a 0^thdimension mel-frequency cepstral representation of the average signal.
  - 6. The method as in claim 1 further comprising:
    - segmenting the input signal into frames;
      
      computing a respective set of coefficients for each of the frames;
      
      normalizing the coefficients associated with the frames;
      
      presenting the normalized coefficients to a speech recognition decoder; and
      
      computing the plurality of scores using the normalized coefficients.
  - 7. The method as in claim 6, wherein the plurality of scores are probability scores, the method further comprising:
    - selecting a subset of scores from the probability scores;
      
      generating the average signal based on the subset of scores; and
      
      adjusting the plurality of scores based on the average signal, the average signal representing sound present in the input signal.
  - 8. The method as in claim 7 further comprising:
    - utilizing the adjusted plurality of scores to determine whether the input signal represents sound on the noise channel or sound on the speech channel.
  - 9. The method as in claim 1, wherein processing the input signal in accordance with the difference value includes:
    - adjusting the plurality of scores based on the average signal; and
      
      utilizing the adjusted plurality of scores to determine whether or not the input signal represents sound on the noise channel or sound on the speech channel.
  - 10. The method as in claim 1 further comprising:
    - determining that the input signal represents sound on the noise, as opposed to sound on the speech channel, based on a magnitude of the difference value.
  - 11. The method as in claim 1 further comprising:
    - detecting that the segment is received on the noise channel in response to detecting that the difference value is above a threshold.
  - 12. The method of claim 1, wherein computing the average signal comprises:
    - identifying a given score from the plurality of scores;
      
      selecting a set of scores from the plurality of scores; and
      
      performing an averaging operation on observation mean vectors associated with the set of scores to produce the average signal.

13. A non-transitory machine readable storage medium storing computer instructions that, when executed, cause a processor-based machine to:
- receive an input signal;
  
  apply to the input signal a cepstrum transformation;
  
  generate a mel-frequency cepstral representation comprising mel-frequency coefficients from the cepstrum transform representation of the input signal;
  
  determine for the input signal a plurality of scores representative of certainties that the input signal corresponds to states of a speech recognition model;
  
  use the speech recognition model and the determined scores to compute an average signal, the average signal representing an estimate of sound that has passed through a speech channel;
  
  compute a difference value representative of a difference between the input signal and the average signal, a magnitude of the difference value indicative of whether the input signal is speech versus noise; and
  
  process the input signal as speech versus noise depending on a magnitude of the difference value;
  
  adjust the plurality of scores based on the difference value;
  
  use the adjusted scores to update the speech recognition model;
  
  wherein the average signal is represented using the mel-frequency cepstral representation; and
  
  wherein the instructions that that cause the processor-based machine to compute the difference value comprise instructions that when executed cause the processor-based machine to;
  
  compute a channel deviation value based on a difference between a 0th dimension of a mel-frequency cepstral coefficient of the mel-cepstral frequency representation of the input signal and a 0th dimension mel-frequency cepstral representation of the average signal.
- View Dependent Claims (14, 15)
- - 14. The non-transitory machine readable storage medium of claim 13, wherein the instructions that cause the processor-based machine to process the input signal in accordance with the difference value comprise instructions that, when executed, cause the processor-based machine to:
    - perform speech recognition on the input signal in response to detecting that the magnitude of the difference value indicates that sound present in the input signal represents speech.
  - 15. The non-transitory machine readable storage medium of claim 13, wherein the instructions that cause the processor-based machine to compute the average signal comprise instructions that, when executed, cause the processor-based machine to:
    - identify from the plurality of scores a best score;
      
      select from the plurality of scores those scores whose corresponding values are within a second predetermined threshold from a value of the best score; and
      
      perform an averaging operation on observation mean vectors of observation densities associated with the selected scores to obtain the average signal.

16. A method comprising:
- receiving an input signal;
  
  segmenting the input signal into multiple frames;
  
  computing a set of coefficients for a given frame of the multiple frames, the set of coefficients representing a cepstral vector;
  
  normalizing the set of coefficients associated with the given frame;
  
  based on using the normalized set of coefficients as inputs, computing an average signal for the given frame, the average signal representative of sound in the input signal;
  
  computing, via at least one processor device, a difference value representative of a difference between the input signal and the average signal; and
  
  processing, via the at least one processor device, the input signal in accordance with the difference value;
  
  wherein computing the average signal for the given frame further comprises;
  
  presenting the normalized set of coefficients for the given frame of the input signal to a speech recognition decoder;
  
  via the speech recognition decoder, computing a plurality of scores for the given frame using the normalized set of coefficients, the plurality of scores representative of probabilities that the cepstral vector associated with the given frame of the input signal corresponds to states in a speech recognition model;
  
  identifying, for the given frame, a particular score from the plurality of scores;
  
  selecting, from the plurality of scores associated the given frame, a particular set of scores whose corresponding values are within a predetermined threshold from a value of the particular score;
  
  performing an averaging operation on observation mean vectors associated with the selected plurality of scores to produce the average signal, the average signal representing an estimate of sound in the input signal that has passed through a speech channel as opposed to a noise channel;
  
  wherein a magnitude of the difference value indicates whether sound present in the input signal is speech versus noise;
  
  utilizing a magnitude of the difference value to determine whether the given frame of the input signal represents speech versus noise; and
  
  processing the input signal as noise, as opposed to speech, in response to detecting that the magnitude of the difference value is above a threshold value.
- View Dependent Claims (17, 18)
- - 17. The method as in claim 16 further comprising:
    - performing a Viterbi search based on the given frame in response to detecting that the given frame represents sound on the speech channel.
  - 18. The method as in claim 16,wherein computing the difference value comprises, producing a vector value indicative of a difference between the set of coefficients associated with the given frame and coefficients of the average signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Zlokarnik, Igor
Primary Examiner(s)
Borsetti, Greg

Application Number

US12/882,292
Publication Number

US 20110004472A1
Time in Patent Office

839 Days
Field of Search

704/252, 704/256, 704/E15.05, 381/94.1, 381/94.7
US Class Current

704/252
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/065   Adaptation

G10L 15/20   Speech recognition techniqu...

Speech recognition using channel verification

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition using channel verification

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links