×

Diarization using textual and audio speaker labeling

  • US 10,446,156 B2
  • Filed: 10/25/2018
  • Issued: 10/15/2019
  • Est. Priority Date: 11/21/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method of diarization, the method comprising:

  • receiving a set of textual transcripts from a transcription server and a set of audio files associated with the set of textual transcripts from an audio database server;

    performing a blind diarization on the set of textual transcripts and the set of audio files to segment and cluster the textual transcripts into a plurality of textual speaker clusters, wherein the number of textual speaker clusters is at least equal to a number of speakers in the textual transcript;

    automatedly applying at least one heuristic to the textual speaker clusters with a processor to select textual speaker clusters likely to be associated with an identified group of speakers;

    analyzing the selected textual speaker clusters with the processor to create at least one linguistic model;

    applying the linguistic model to transcribed audio data with the processor to label a portion of the transcribed audio data as having been spoken by the identified group of speakers;

    saving the at least one linguistic model to a linguistic database server and associating it with the labeled speaker;

    with the processor, receiving a new textual transcript from the transcription server and a new audio file associated with the new textual transcript from the audio database server;

    receiving the at least one linguistic model from the linguistic database server;

    receiving at least one acoustic voiceprint associated with a specific speaker from a voiceprint database server;

    applying the received at least one linguistic model from the linguistic database server to the new audio file transcript from an audio source to perform diarization of the new audio file by blind diarizing the new audio file and new textual transcript, comparing each new textual speaker cluster to the at least one linguistic model, and labeling each textual speaker cluster as belonging to a customer service agent or belonging to a customer, comparing each audio speaker segment to the at least one acoustic voiceprint, and labeling each audio speaker segment as belonging to a known speaker or belonging to an unknown speaker;

    when one of the audio speaker segments is labeled as belonging to a known speaker, selecting and transcribing the labeled audio speaker segments with the transcription server;

    comparing the selected transcribed labeled audio speaker segments to the textual speaker clusters labeled as belonging to a customer service agent; and

    when the compared transcribed segments and clusters are each labeled as belonging to a known speaker and a customer service agent, keeping the current labels, otherwise relabeling the textual speaker cluster as belonging to an unknown speaker.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×