×

Multi-sensory speech enhancement using a speech-state model

  • US 7,680,656 B2
  • Filed: 06/28/2005
  • Issued: 03/16/2010
  • Est. Priority Date: 06/28/2005
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of determining an estimate for a noise-reduced value representing a portion of a noise-reduced speech signal, the method comprising:

  • generating an alternative sensor signal using an alternative sensor;

    generating an air conduction microphone signal;

    using the alternative sensor signal and the air conduction microphone signal to estimate a likelihood, L(St) of a speech state, St by estimating a separate likelihood of the speech state for each of a set of frequency components and combining the separate likelihoods to form the likelihood of the speech state; and

    using the likelihood of the speech state to estimate the noise-reduced value, {circumflex over (X)}t, as;

    X ^ t =

    s

    { S }




    π

    s


    E

    ( X t

    Y t
    , B t , S t = s
    )
    where π

    s is a posterior on the state and is given by;

    π

    s
    = L

    ( S t = s )


    s

    { S }




    L

    ( S t = s )
    and where;

    E

    ( X t

    Y t
    , B t , S t = s
    )
    = σ

    s 2


    ( σ

    p 2


    Y t
    + M * ( ( σ

    u 2
    + g 2

    σ

    v 2
    )


    B t
    - g 2

    σ

    v 2


    GY t
    )
    σ

    p 2


    ( σ

    u 2
    + g 2

    σ

    v 2
    + σ

    s 2
    )
    +

    M

    2


    σ

    s 2


    ( σ

    u 2
    + g 2

    σ

    v 2
    )
    )
    where ;



    σ

    p 2
    = σ

    w 2
    + g 2

    σ

    v 2


    σ

    u 2
    σ

    u 2
    + g 2

    σ

    v 2




    G

    2




    and
    M = H - g 2

    σ

    v 2
    σ

    u

    2
    + g 2

    σ

    v 2


    G
    where M* is the complex conjugate of M, Xt is a noise reduced value, Yt is a value for a frame t of the air conduction microphone signal, Bt is a value for a frame t of the alternative sensor signal, σ

    u2 is a variance of sensor noise in the air conduction microphone, σ

    w2 is a variance of sensor noise in the alternative sensor, g2σ

    v2 is the variance of ambient noise, G is the channel response of the alternative sensor to ambient noise, H is the channel response of the alternative sensor to a clean speech signal, S is the set of all speech states, σ

    s2 is a variance for a distribution that models a probability of a noise-reduced value given a speech state and E(Xt|Yt,Bt,St=s) is the expectation of Xt given Yt, Bt, and a speech state of s.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×