×

System for detecting speech interval and recognizing continuous speech in a noisy environment through real-time recognition of call commands

  • US 8,275,616 B2
  • Filed: 04/22/2009
  • Issued: 09/25/2012
  • Est. Priority Date: 05/28/2008
  • Status: Active Grant
First Claim
Patent Images

1. A method for detecting a speech interval and recognizing continuous speech using real-time recognition of call commands,wherein, when a speaker speaks a call command, the call command is recognized, confidence rate of the call command is measured, and a speech interval spoken subsequent to the call command is applied to a continuous speech recognition engine at a moment at which the call command is recognized, thus recognizing speech of the speaker;

  • wherein the recognition of the call command is performed by a call command recognition network which is implemented using a Left-to-Right (LTR) model, and a speech frame input to the recognition network is configured to include predetermined tokens and is compared based on probability with the recognition network in real time;

    wherein each of the predetermined tokens includes the speech frame and a silence interval accompanied by noise;

    wherein the call command recognition network is configured such that, when an accumulated probability of the predetermined token which is computed in real time after passing through the call command recognition network falls within a range of a predetermined upper percentage, the call command is estimated to have been spoken, and the speech frame is transferred to a confidence measurement stage; and

    wherein, when the call command recognition network is configured such that, a beam width is limited to 20 or 30 tokens and when the accumulated probability of the predetermined token obtained at a moment at which a transition to a silence model is made while real-time computation is performed on the frames which are continuously input corresponds to a top 10% of the 20 or 30 tokens in the call command recognition network, the call is estimated to have been spoken, and the speech frame is transferred to a confidence measurement stage;

    wherein the confidence measurement stage is determined by the following equation;


    LLRk(0

    k
    )=log p(0

    k
    )−

    log p(0

    k
    )where LLR is log likelihood ratio, λ

    k is a phoneme model, and λ

    k is an anti-phoneme model.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×