×

SPEECH PROCESSING SYSTEM AND METHOD

  • US 20120041764A1
  • Filed: 08/10/2011
  • Published: 02/16/2012
  • Est. Priority Date: 08/16/2010
  • Status: Active Grant
First Claim
Patent Images

1. A speech processing method, comprising:

  • receiving a speech input which comprises a sequence of feature vectors;

    determining the likelihood of a sequence of words arising from the sequence of feature vectors using an acoustic model and a language model, comprising;

    providing an acoustic model for performing speech recognition on an input signal which comprises a sequence of feature vectors, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to a feature vector, wherein said speech input is a mismatched speech input which is received from a speaker in an environment which is not matched to the speaker or environment under which the acoustic model was trained; and

    adapting the acoustic model to the mismatched speech input,the speech processing method further comprising determining the likelihood of a sequence of features occurring in a given language using a language model; and

    combining the likelihoods determined by the acoustic model and the language model and outputting a sequence of words identified from said speech input signal,wherein adapting the acoustic model to the mismatched speaker input comprises;

    relating speech from the mismatched speaker input to the speech used to train the acoustic model using;

    a mismatch function f for primarily modelling differences between the environment of the speaker and the environment under which the acoustic model was trained; and

    a speaker transform F for primarily modelling differences between the speaker of the mismatched speaker input, such that;


    y=f(F(x,v),u)where y represents the speech from the mismatched speaker input, x is the speech used to train the acoustic model, u represents at least one parameter for modelling changes in the environment and v represents at least one parameter used for mapping differences between speakers; and

    jointly estimating u and v.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×