×

Speech recognition device and the operation method thereof

  • US 9,514,751 B2
  • Filed: 03/25/2014
  • Issued: 12/06/2016
  • Est. Priority Date: 09/17/2013
  • Status: Active Grant
First Claim
Patent Images

1. A speech recognition device comprising:

  • at least one hardware processor configured to;

    receive, from a speech recognition terminal, speech data corresponding to a speech input by a speaking person and multi-sensor data corresponding to an environment in which the speech is input by the speaking person, the multi-sensor data being useable as additional information to the speech input for performing speech recognition and the multi-sensor data including an image of the speaking person and estimated location and position of the speech recognition terminal to the speaking person while the speech is input;

    select a language model from a plurality of language models for the speech input, the language model being selected as representing a correspondence between a plurality of data among the multi-sensor data including the image of the speaking person of the speech input, the environment in which the speech is input by the speaking person, and the estimated location and position of the speech recognition terminal to the speaking person and previous multi-sensor data including a plurality of data among previous images of speaking persons and corresponding environments in which previous speeches are input;

    select an acoustic model from among a plurality of acoustic models for the speech input, the acoustic model being selected as representing a correspondence between a plurality of data among the multi-sensor data including the image of the speaking person of the speech input, the environment in which the speech is input by the speaking person, the estimated location and position of the speech recognition terminal to the speaking person, and an estimated signal to noise ratio (SNR) for the speech data and the previous multi-sensor data including the plurality of data among previous images of speaking persons and the corresponding environments in which previous speeches are input; and

    control the speech recognition of the speech input to be performed according to the selected language model and the selected acoustic model which varies in consideration of the plurality of data among the multi-sensor data obtained while the speech is input through application of a feature vector extracted from the speech data to the selected language model and the selected acoustic model, and transmit a result of the speech recognition of the speech data to the speech recognition terminal,wherein the estimated SNR for the speech varies according to a relationship determined between the speech input and proximity of a distance between the speech recognition terminal and the speaking person obtained through the estimated location and position of the speech recognition terminal to the speaking person while the speech is being input.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×