×

Blind diarization of recorded calls with arbitrary number of speakers

  • US 9,881,617 B2
  • Filed: 09/01/2016
  • Issued: 01/30/2018
  • Est. Priority Date: 07/17/2013
  • Status: Active Grant
First Claim
Patent Images

1. A method for automatically transcribing a customer service telephone conversation between an arbitrary number of speakers, the method comprising:

  • receiving data corresponding to the telephone conversation, wherein the received data comprises audio data and metadata that identifies one or more of the speakers in the audio data;

    separating the audio data into frames;

    analyzing the frames to identify utterances, wherein each utterance comprises a plurality of frames;

    performing blind diarization of the audio data to differentiate speakers, wherein the blind diarization comprises;

    representing each utterance as a utterance model based on acoustic features of each utterance,clustering the utterance models,creating speaker models from each of the clusters,constructing a hidden Markov model from the speaker models, anddecoding the hidden Markov model to differentiate speakers of each utterance;

    tagging homogeneous speaker segments in the telephone conversation with a tag unique for each speaker;

    performing speaker diarization to replace one or more of the tags with a speaker'"'"'s identity, wherein the speaker diarization comprises;

    comparing the homogeneous speaker segments in the telephone conversation to one or more models retrieved from a database wherein the one or more models retrieved correspond to the one or more speakers identified in the metadata, andbased on the comparison, identifying one or more of the speakers; and

    transcribing the conversation to obtain a text representation of the conversation, wherein each spoken part of the conversation is labeled with either the speaker'"'"'s identity or the tag associated with the speaker.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×