Diarization using acoustic labeling

US 10,134,400 B2
Filed: 11/20/2013
Issued: 11/20/2018
Est. Priority Date: 11/21/2012
Status: Active Grant

First Claim

Patent Images

1. A method of diarization of audio files, the method comprising:

receiving a plurality of audio files from a database server and speaker metadata associated with each of the plurality of audio files from the database server of the plurality of audio files;

identifying, with a processor, a subset of audio files from the plurality belonging to a specific speaker based upon the received speaker metadata, wherein each audio file is a recording of a customer service interaction and the specific speaker is a customer service agent and there is at least one other speaker in the audio file, wherein the at least one other speaker is not the identified specific speaker;

selecting a subset of the audio files belonging to the specific speaker of the identified set of audio files with the processor;

wherein each audio file of the subset is selected to maximize an acoustical difference in voice frequencies between the specific speaker and the at least one other speaker in the same audio file, wherein the audio file is a recording of a customer service interaction and the specific speaker is a customer service agent and the at least one other speaker is a customer wherein the acoustical difference is a distance between clusters identified by blind diarization between the specific speaker and the at least one other speaker in the same audio file, wherein the at least one other speaker is a customer;

computing an acoustic voiceprint for the specific speaker from the selected subset of audio files with the processor by diarizing the audio files into speaker segments, clustering similar speaker segments, classifying the clustered speaker segments as belonging to the customer service agent or the customer, and building the acoustic voiceprint using the clustered speaker segments belonging to the customer service agent, wherein the acoustic voiceprint will consist of all clustered speaker segments that are a match to the customer service agent;

saving the acoustic voiceprint to a voiceprint database server and associating it with the metadata of the known speaker; and

with the processor, applying the saved acoustic voiceprint from the voiceprint database server to a new audio file from an audio source to identify the specific speaker in diarization of the new audio file by diarizing the new audio file into new speaker segments, comparing each new speaker segment to the acoustic voiceprint, and determining if the new speaker segment matches the acoustic voiceprint.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and method of diarization of audio files use an acoustic voiceprint model. A plurality of audio files are analyzed to arrive at an acoustic voiceprint model associated to an identified speaker. Metadata associate with an audio file is used to select an acoustic voiceprint model. The selected acoustic voiceprint model is applied in a diarization to identify audio data of the identified speaker.

153 Citations

17 Claims

1. A method of diarization of audio files, the method comprising:
- receiving a plurality of audio files from a database server and speaker metadata associated with each of the plurality of audio files from the database server of the plurality of audio files;
  
  identifying, with a processor, a subset of audio files from the plurality belonging to a specific speaker based upon the received speaker metadata, wherein each audio file is a recording of a customer service interaction and the specific speaker is a customer service agent and there is at least one other speaker in the audio file, wherein the at least one other speaker is not the identified specific speaker;
  
  selecting a subset of the audio files belonging to the specific speaker of the identified set of audio files with the processor;
  
  wherein each audio file of the subset is selected to maximize an acoustical difference in voice frequencies between the specific speaker and the at least one other speaker in the same audio file, wherein the audio file is a recording of a customer service interaction and the specific speaker is a customer service agent and the at least one other speaker is a customer wherein the acoustical difference is a distance between clusters identified by blind diarization between the specific speaker and the at least one other speaker in the same audio file, wherein the at least one other speaker is a customer;
  
  computing an acoustic voiceprint for the specific speaker from the selected subset of audio files with the processor by diarizing the audio files into speaker segments, clustering similar speaker segments, classifying the clustered speaker segments as belonging to the customer service agent or the customer, and building the acoustic voiceprint using the clustered speaker segments belonging to the customer service agent, wherein the acoustic voiceprint will consist of all clustered speaker segments that are a match to the customer service agent;
  
  saving the acoustic voiceprint to a voiceprint database server and associating it with the metadata of the known speaker; and
  
  with the processor, applying the saved acoustic voiceprint from the voiceprint database server to a new audio file from an audio source to identify the specific speaker in diarization of the new audio file by diarizing the new audio file into new speaker segments, comparing each new speaker segment to the acoustic voiceprint, and determining if the new speaker segment matches the acoustic voiceprint.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the subset is the top 50% or less of the audio files based upon an acoustical difference between the specific speaker and the at least one other speaker.
  - 3. The method of claim 1, further comprising:
    - performing a blind diarization on the new audio file to separate the new audio file into at least a first speaker audio file and a second speaker audio file;
      
      wherein the acoustic voiceprint is applied to the first speaker audio file and the second speaker audio file to identify one of the first speaker audio file and the second speaker audio file as the specific speaker.
  - 4. The method of claim 3, further comprising:
    - receiving speaker metadata for the new audio file; and
      
      selecting the acoustic voiceprint from a plurality of acoustic voiceprints based upon the received speaker metadata.
  - 5. The method of claim 4, wherein the blind diarization is based in part upon the acoustic voiceprint.
  - 6. The method of claim 1, further comprising:
    - diarizing each of the audio files in the selected subset into speaker segments; and
      
      clustering the speaker segments into specific speaker segments and other speaker segments.
  - 7. The method of claim 2, wherein the clustered specific speaker segments and other speaker segments are compared to the speaker metadata to evaluate the speaker metadata.
  - 8. The method of claim 1, further comprising:
    - applying a linguistic model to the new audio file; and
      
      using the application of the linguistic model to identify the specific speaker in the diarization of the new audio file.
  - 9. The method of claim 8, further comprising:
    - comparing an identification of the specific speaker based upon the acoustic voiceprint to an identification of the specific speaker based upon the linguistic model to select portions of each of the identifications in the diarization of the new audio file.

10. A method of diarization of audio files of customer service interactions between at least one agent and at least one customer, the method comprising:
- receiving a plurality of audio files from a database server and agent metadata associated with each of a the plurality of audio files from the database server of the plurality of audio files;
  
  identifying, with a processor, a set of audio files from the plurality of audio files associated to a specific agent based upon the received agent metadata;
  
  selecting a subset of the audio files belonging to the agent metadata of the identified set of audio files with the processor that maximize an acoustical difference in voice frequencies between audio data of the agent and audio data of at least one other speaker in each of the individual audio files, wherein the at least one other speaker is a customer;
  
  computing an acoustic voiceprint from the audio data of the agent in the selected subset with the processor by diarizing the audio files into speaker segments, clustering similar speaker segments, classifying the clustered speaker segments as belonging to the customer service agent or the customer, and building the acoustic voiceprint using the clustered speaker segments belonging to the customer service agent, wherein the acoustic voiceprint will include all clustered speaker segments that are a match to the customer service agent;
  
  saving the acoustic voiceprint to a voiceprint database server and associating it with the metadata of the customer service agent; and
  
  applying the saved acoustic voiceprint from the voiceprint database server to a new audio file from an audio source to identify the agent in diarization of the new audio file by diarizing the new audio file into new speaker segments with the processor, comparing each new speaker segment to the acoustic voiceprint, and determining if the new speaker segment matches the acoustic voiceprint.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The method of claim 10, wherein the agent metadata is an agent identification number.
  - 12. The method of claim 11, further comprising:
    - transcribing the new audio file to create an audio file transcription; and
      
      diarizing the new audio file based in part upon the audio file transcription to cluster audio data from the new audio file into at least first speaker audio data and second speaker audio data.
  - 13. The method of claim 12, wherein the acoustic voiceprint is applied to the first speaker audio data and the second speaker audio data to identify one of the first speaker audio data and the second speaker audio data as the specific speaker as the agent.
  - 14. The method of claim 13, further comprising:
    - receiving speaker metadata for the new audio file; and
      
      selecting the acoustic voiceprint from a plurality of acoustic voiceprints based upon the received speaker metadata.
  - 15. The method of claim 10, wherein the diarization of the new audio file separates the new audio file into agent audio data and at least one customer audio data.

16. A system for diarization of audio files, the system comprising:
- a database server of audio files, each audio file of the database server being associated with metadata identifying at least one speaker in the audio file;
  
  a processor communicatively connected to the database wherein the processor selects a set of audio files with the same speaker based upon the metadata, filters the selected set to a subset of the audio files that maximize an acoustical difference in voice frequencies between audio data of at least two speakers in an individual audio file, wherein one of the at least two speakers is a customer service agent and the other is a customer, and creates an acoustic voiceprint for the speaker identified by the metadata by diarizing the audio files into speaker segments, clustering similar speaker segments, classifying the clustered speaker segments as belonging to the customer service agent or the customer, and building the acoustic voiceprint using all of the clustered speaker segments belonging to that match the customer service agent;
  
  a voiceprint database server of a plurality of acoustic voiceprints, each acoustic voiceprint of the plurality associated with a speaker; and
  
  an audio source that provides new audio data to the processor with metadata that identifies at least one speaker in the audio data;
  
  wherein the processor selects an acoustic voiceprint from the plurality of acoustic voiceprints based upon the metadata and applies the selected acoustic voiceprint to the new audio data to identify audio data of the speaker in the new audio data for diarization of the new audio data by diarizing the new audio file into new speaker segments, comparing each new speaker segment to the acoustic voiceprint, and determining if the new speaker segment matches the acoustic voiceprint.
- View Dependent Claims (17)
- - 17. The system of claim 16, wherein the processor further applies at least one acoustic model and at least one linguistic model to the diarized audio data to transcribe the audio data to produce an automated transcript.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Verint Systems Incorporated
Original Assignee
Verint Systems Limited (Verint Systems Incorporated)
Inventors
Ziv, Omer, Achituv, Ran, Shapira, Ido, Dreyfuss, Jeremie
Primary Examiner(s)
Opsasnick, Michael N

Application Number

US14/084,974
Publication Number

US 20140142944A1
Time in Patent Office

1,826 Days
Field of Search

704235
US Class Current
CPC Class Codes

G10L 17/00 Speaker identification or v...

G10L 17/02 Preprocessing operations, e...

Diarization using acoustic labeling

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

153 Citations

17 Claims

Specification

Use Cases

Quick Links

Others

Diarization using acoustic labeling

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

153 Citations

17 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others