×

Blind diarization of recorded calls with arbitrary number of speakers

  • US 9,460,722 B2
  • Filed: 06/30/2014
  • Issued: 10/04/2016
  • Est. Priority Date: 07/17/2013
  • Status: Active Grant
First Claim
Patent Images

1. A method of diarization of audio data, the method comprising:

  • receiving audio data;

    segmenting the audio data into a plurality of frames,segmenting audio data into a plurality of utterances, wherein each of the plurality of utterances comprises one or more of the plurality of frames;

    extracting at least one acoustic feature from each of the plurality of frames, wherein the acoustic features are Mel-frequency cepstral coefficients (MFCC);

    representing each utterance as an utterance model representative of the MFCC;

    approximating a distribution of the MFCC in each utterance by calculating at least one Gaussian mixture model (GMM) for each utterance;

    calculating a distance between each GMM;

    constructing an affinity matrix based upon the distances between utterances;

    computing a stochastic matrix from the affinity matrix;

    computing eigenvalues and corresponding eigenvectors for the stochastic matrix;

    embedding the utterances into multi-dimensional vectors, wherein the utterance models comprise the multi-dimensional vectors;

    clustering the utterance models;

    constructing a plurality of speaker models from the clustered utterance models;

    constructing a hidden Markov model of the plurality of speaker models;

    decoding a sequence of identified speaker models that best corresponds to the utterances of the audio data; and

    creating diarized audio data using the sequence of identified speaker models that best correspond to the utterances of the audio data.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×