×

Keyword detection with international phonetic alphabet by foreground model and background model

  • US 9,466,289 B2
  • Filed: 12/11/2013
  • Issued: 10/11/2016
  • Est. Priority Date: 01/29/2013
  • Status: Active Grant
First Claim
Patent Images

1. A method of detecting keywords, comprising:

  • at an electronic device with one or more processors and memory;

    training an acoustic model with an International Phonetic Alphabet (IPA) phoneme mapping collection and a plurality of audio samples in a plurality of different languages, wherein the acoustic model includes;

    a foreground model configured to match a phoneme in an input audio signal to a corresponding keyword, wherein the foreground model is trained by (i) obtaining a phoneme collection for each of the plurality of different languages, (ii) generating a plurality of triphones by linking phonemes in the phoneme collection corresponding to the language, and (iii) performing Gaussian splitting training on the triphones that are clustered with a decision tree corresponding to the language; and

    a background model configured to match a phoneme in the input audio signal to a corresponding non-keyword;

    after training the acoustic model, generating a phone decoder based on the trained acoustic model;

    obtaining a keyword phoneme sequence for a respective keyword in a respective language of the plurality of different languages, including;

    collecting a set of keyword audio samples for the respective keyword in the respective language;

    decoding the set of keyword audio samples with the phone decoder to generate a set of phoneme sequence candidates for the respective keyword, each phoneme sequence candidate corresponding to a respective keyword audio sample; and

    selecting the keyword phoneme sequence for the respective keyword from the set of phoneme sequence candidates by choosing a phoneme of a highest confidence measure from one of the set of phoneme sequence candidates at each location in the corresponding sequence and assembling the chosen phonemes into the keyword phoneme sequence according to their locations in the corresponding sequence;

    after obtaining the keyword phoneme sequence, detecting one or more keywords in the input audio signal with the trained acoustic model, including;

    matching one or more phonemic keyword portions of the input audio signal with one or more phonemes in the keyword phoneme sequence with the foreground model; and

    filtering out one or more phonemic non-keyword portions of the input audio signal with the background model.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×