×

Speech recognition system adaptation based on non-acoustic attributes and face selection based on mouth motion using pixel intensities

  • US 9,881,610 B2
  • Filed: 11/13/2014
  • Issued: 01/30/2018
  • Est. Priority Date: 11/13/2014
  • Status: Active Grant
First Claim
Patent Images

1. An apparatus, comprising:

  • a memory; and

    a processor operatively coupled to the memory and configured to;

    determine a vicinity from which speech input to a speech recognition system originates, wherein the determination of the vicinity comprises an estimation of a sound direction of a source of the speech input based on a signal processing method;

    obtain non-acoustic data from the vicinity of the speech input using one or more non-acoustic sensors, wherein, in the obtaining of the non-acoustic data, the processor is configured to capture visual data of the vicinity of the speech input;

    identify a subject speaker as the source of the speech input from the obtained non-acoustic data, wherein, in the identification of the subject speaker, the processor is configured to;

    segment one or more faces from the captured visual data;

    detect mouth motion on the one or more faces, wherein the detection of the mouth motion comprises an application of temporal differencing on each of the one or more faces by comparing a first pixel intensity associated at a first time with a second pixel intensity at a second time; and

    select a face corresponding to the subject speaker from the one or more faces in response to a determination that a number of significantly changed pixels between the first pixel intensity and the second pixel intensity exceeds a threshold;

    extract one or more non-acoustic attributes associated with the subject speaker from the obtained non-acoustic data;

    analyze the one or more non-acoustic attributes, and assign at least one demographic to the subject speaker based on the analysis;

    select at least one model for use by the speech recognition system based on the demographic assigned to the subject speaker;

    adjust the speech recognition system using the at least one selected model; and

    process the speech input using the adjusted speech recognition system.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×