Diarization using speech segment labeling
First Claim
Patent Images
1. A method of diarization of audio files, the method comprising:
- receiving a plurality of audio files from a database server and speaker metadata associations with each of the plurality of audio files, wherein each audio file is a recording of a customer service interaction including a known speaker and at least one other speaker, wherein the known speaker is a specific customer service agent and the at least one other speaker is a customer;
selecting a subset of the audio files, wherein each audio file of the subset is selected to maximize an acoustical difference in voice frequencies between the known speaker and the at least one other speaker in the same audio file;
performing a blind diarization on the subset of audio files to segment the audio files into a plurality of segments of speech separated by non-speech, such that each segment has a high likelihood of containing speech sections from a single speaker;
automatedly applying at least one metric to the segments of speech with a processor to label segments of speech likely to be associated with the known speaker and clustering the selected segments into an audio speaker segment;
analyzing the selected audio speaker segment to create an acoustic voiceprint, wherein the acoustic voiceprint is built from all the selected speaker segments;
applying the acoustic voiceprint to the audio files with the processor to label a portion of the audio file as having been spoken by the known speaker;
adding the labeled portion of the audio file to the acoustic voiceprint;
saving the acoustic voiceprint to a voiceprint database server and associating it with the metadata of the known speaker; and
with the processor, applying the saved acoustic voiceprint from the voiceprint database server to a new audio file from an audio source to perform diarization of the new audio file by blind diarizing the new audio file, comparing each of the new speech segments to the acoustic voiceprint, and labeling each speech segment as belonging to the known speaker associated with the acoustic voiceprint or belonging to an other speaker.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and method of diarization of audio files use an acoustic voiceprint model. A plurality of audio files are analyzed to arrive at an acoustic voiceprint model associated to an identified speaker. Metadata associate with an audio file is used to select an acoustic voiceprint model. The selected acoustic voiceprint model is applied in a diarization to identify audio data of the identified speaker.
136 Citations
18 Claims
-
1. A method of diarization of audio files, the method comprising:
-
receiving a plurality of audio files from a database server and speaker metadata associations with each of the plurality of audio files, wherein each audio file is a recording of a customer service interaction including a known speaker and at least one other speaker, wherein the known speaker is a specific customer service agent and the at least one other speaker is a customer; selecting a subset of the audio files, wherein each audio file of the subset is selected to maximize an acoustical difference in voice frequencies between the known speaker and the at least one other speaker in the same audio file; performing a blind diarization on the subset of audio files to segment the audio files into a plurality of segments of speech separated by non-speech, such that each segment has a high likelihood of containing speech sections from a single speaker; automatedly applying at least one metric to the segments of speech with a processor to label segments of speech likely to be associated with the known speaker and clustering the selected segments into an audio speaker segment; analyzing the selected audio speaker segment to create an acoustic voiceprint, wherein the acoustic voiceprint is built from all the selected speaker segments; applying the acoustic voiceprint to the audio files with the processor to label a portion of the audio file as having been spoken by the known speaker; adding the labeled portion of the audio file to the acoustic voiceprint; saving the acoustic voiceprint to a voiceprint database server and associating it with the metadata of the known speaker; and with the processor, applying the saved acoustic voiceprint from the voiceprint database server to a new audio file from an audio source to perform diarization of the new audio file by blind diarizing the new audio file, comparing each of the new speech segments to the acoustic voiceprint, and labeling each speech segment as belonging to the known speaker associated with the acoustic voiceprint or belonging to an other speaker. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system for diarization of audio files, the system comprising:
-
An audio database server comprising a plurality of audio files and speaker metadata associated with each of the plurality of audio files, wherein each audio file is a recording of a customer service interaction including a known speaker and at least one other speaker, wherein the know speaker is a specific customer service agent and the at least one other speaker is a customer; a processor that receives a set of audio files associated with a set of speaker metadata wherein the audio files are all from a specific known speaker, selects a subset of the audio files, wherein each audio file of the subset is selected to maximize an acoustical difference in voice frequencies between the known speaker and the at least one other speaker in the same audio file, performs a blind diarization on the subset of audio files to segment the audio files into a plurality of segments of speech separated by non-speech, such that each segment has a high likelihood of containing speech sections from a single speaker, automatedly applies at least one metric to the segments of speech with a processor to label segments of speech likely to be associated with the known speaker and clustering the selected segments into an audio speaker segment, analyzes the selected audio speaker segment to create an acoustic voiceprint, wherein the acoustic voiceprint is built from all the selected speaker segments, applies the acoustic voiceprint to the audio files with the processor to label a portion of the audio file as having been spoken by the known speaker, adds the labeled portion of the audio file to the acoustic voiceprint; a voiceprint database server that stores the at least one acoustic voiceprint; and an audio source that provides new audio files to the processor; wherein the processor applies the saved acoustic voiceprint from the voiceprint database server to a new audio file from an audio source to perform diarization of the new audio file by blind diarizing the new audio file, comparing each of the new speech segments to the acoustic voiceprint, and labeling each speech segment as belonging to the known speaker associated with the acoustic voiceprint or belonging to an other speaker. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
Specification