×

Diarization using acoustic labeling

  • US 10,134,400 B2
  • Filed: 11/20/2013
  • Issued: 11/20/2018
  • Est. Priority Date: 11/21/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method of diarization of audio files, the method comprising:

  • receiving a plurality of audio files from a database server and speaker metadata associated with each of the plurality of audio files from the database server of the plurality of audio files;

    identifying, with a processor, a subset of audio files from the plurality belonging to a specific speaker based upon the received speaker metadata, wherein each audio file is a recording of a customer service interaction and the specific speaker is a customer service agent and there is at least one other speaker in the audio file, wherein the at least one other speaker is not the identified specific speaker;

    selecting a subset of the audio files belonging to the specific speaker of the identified set of audio files with the processor;

    wherein each audio file of the subset is selected to maximize an acoustical difference in voice frequencies between the specific speaker and the at least one other speaker in the same audio file, wherein the audio file is a recording of a customer service interaction and the specific speaker is a customer service agent and the at least one other speaker is a customer wherein the acoustical difference is a distance between clusters identified by blind diarization between the specific speaker and the at least one other speaker in the same audio file, wherein the at least one other speaker is a customer;

    computing an acoustic voiceprint for the specific speaker from the selected subset of audio files with the processor by diarizing the audio files into speaker segments, clustering similar speaker segments, classifying the clustered speaker segments as belonging to the customer service agent or the customer, and building the acoustic voiceprint using the clustered speaker segments belonging to the customer service agent, wherein the acoustic voiceprint will consist of all clustered speaker segments that are a match to the customer service agent;

    saving the acoustic voiceprint to a voiceprint database server and associating it with the metadata of the known speaker; and

    with the processor, applying the saved acoustic voiceprint from the voiceprint database server to a new audio file from an audio source to identify the specific speaker in diarization of the new audio file by diarizing the new audio file into new speaker segments, comparing each new speaker segment to the acoustic voiceprint, and determining if the new speaker segment matches the acoustic voiceprint.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×