×

Diarization using speech segment labeling

  • US 10,438,592 B2
  • Filed: 10/25/2018
  • Issued: 10/08/2019
  • Est. Priority Date: 11/21/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method of diarization of audio files, the method comprising:

  • receiving a plurality of audio files from a database server and speaker metadata associations with each of the plurality of audio files, wherein each audio file is a recording of a customer service interaction including a known speaker and at least one other speaker, wherein the known speaker is a specific customer service agent and the at least one other speaker is a customer;

    selecting a subset of the audio files, wherein each audio file of the subset is selected to maximize an acoustical difference in voice frequencies between the known speaker and the at least one other speaker in the same audio file;

    performing a blind diarization on the subset of audio files to segment the audio files into a plurality of segments of speech separated by non-speech, such that each segment has a high likelihood of containing speech sections from a single speaker;

    automatedly applying at least one metric to the segments of speech with a processor to label segments of speech likely to be associated with the known speaker and clustering the selected segments into an audio speaker segment;

    analyzing the selected audio speaker segment to create an acoustic voiceprint, wherein the acoustic voiceprint is built from all the selected speaker segments;

    applying the acoustic voiceprint to the audio files with the processor to label a portion of the audio file as having been spoken by the known speaker;

    adding the labeled portion of the audio file to the acoustic voiceprint;

    saving the acoustic voiceprint to a voiceprint database server and associating it with the metadata of the known speaker; and

    with the processor, applying the saved acoustic voiceprint from the voiceprint database server to a new audio file from an audio source to perform diarization of the new audio file by blind diarizing the new audio file, comparing each of the new speech segments to the acoustic voiceprint, and labeling each speech segment as belonging to the known speaker associated with the acoustic voiceprint or belonging to an other speaker.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×