×

User defined key phrase detection by user dependent sequence modeling

  • US 10,043,521 B2
  • Filed: 07/01/2016
  • Issued: 08/07/2018
  • Est. Priority Date: 07/01/2016
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for user dependent key phrase enrollment comprising:

  • receiving, via a microphone, an audio input representing a user defined key phrase and converting the audio input to received audio data representative of the audio input;

    determining a sequence of most probable audio units corresponding to the received audio data, wherein each audio unit of most probable audio units corresponds to a frame of a plurality of frames of the audio data;

    processing the sequence of most probable audio units to eliminate at least one audio unit from the sequence of most probable audio units to generate a final sequence of audio units bydetermining a first silence audio unit of the sequence and a number of silence audio units immediately temporally following the first silence audio unit,wherein the first silence audio unit and the number of silence audio units are between non-silence audio units of the sequence, andeliminating the first silence audio unit and the immediately temporally following silence audio units in response to the total number of consecutive silence audio units not exceeding a threshold;

    generating a key phrase recognition model representing the user defined key phrase based on the final sequence of audio units, the key phrase recognition model comprising a single start state based rejection model, a key phrase model, and a transition from the single start state based rejection model to the key phrase model,wherein the single start state based rejection model includes a single rejection state having a plurality of rejection model self loops, wherein the key phrase model comprises a plurality of states having transitions therebetween, the plurality of states including a final state of the key phrase model, and wherein the plurality of states of the key phrase model correspond to the final sequence of audio units;

    receiving a further audio input for evaluation by the key phrase recognition model;

    generating a time series of scores of audio units based on a time series of feature vectors representative of the further audio input;

    scoring the key phrase recognition model based on the time series of scores of audio units to generate a rejection likelihood score and a key phrase likelihood score; and

    recognizing that the further audio input corresponds to the user defined key phrase based on the rejection likelihood score and the key phrase likelihood score.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×