×

Speech recognition system adaptation based on non-acoustic attributes and face selection based on mouth motion using pixel intensities

  • US 9,899,025 B2
  • Filed: 07/02/2015
  • Issued: 02/20/2018
  • Est. Priority Date: 11/13/2014
  • Status: Active Grant
First Claim
Patent Images

1. A method, comprising the steps of:

  • determining a vicinity from which speech input to a speech recognition system originates, wherein determining the vicinity comprises estimating a sound direction of a source of the speech input based on a signal processing method;

    obtaining non-acoustic data from the vicinity of the speech input using one or more non-acoustic sensors, wherein obtaining the non-acoustic data comprises capturing visual data of the vicinity of the speech input;

    identifying a subject speaker as the source of the speech input from the obtained non-acoustic data, wherein identifying the subject speaker comprises;

    segmenting one or more faces from the captured visual data;

    detecting mouth motion on the one or more faces, wherein detecting the mouth motion comprises applying temporal differencing on each of the one or more faces by comparing a first pixel intensity associated at a first time with a second pixel intensity at a second time; and

    selecting a face corresponding to the subject speaker from the one or more faces in response to determining that a number of significantly changed pixels between the first pixel intensity and the second pixel intensity exceeds a threshold;

    extracting one or more non-acoustic attributes associated with the subject speaker from the obtained non-acoustic data;

    analyzing the one or more extracted non-acoustic attributes, and assigning at least one demographic to the subject speaker based on the analysis;

    selecting at least one model for use by the speech recognition system based on the at least one demographic assigned to the subject speaker;

    adjusting the speech recognition system using the at least one selected model; and

    processing the speech input using the adjusted speech recognition system;

    wherein the steps are performed by at least one processor device coupled to a memory.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×