Speaker separation in diarization
First Claim
Patent Images
1. A method of producing a diarized transcript from a digital audio file, the method comprising:
- obtaining a digital audio file;
splitting the digital audio file into a plurality of frames;
segmenting the digital audio file into entropy segments based upon an entropy of each frame;
performing a blind diarization to identify a first speaker audio file and a second speaker audio file by clustering the entropy segments into the first speaker audio file and the second speaker audio file, wherein the first speaker audio file only contains audio attributed to the first speaker and the second speaker audio file only contains audio attributed to the second speaker; and
identifying one of the first speaker audio file and second speaker audio file as an agent audio file and another of the first speaker audio file and the second speaker audio file as a customer audio file; and
transcribing the agent audio file and the customer audio file to produce a diarized transcript.
2 Assignments
0 Petitions
Accused Products
Abstract
The system and method of separating speakers in an audio file including obtaining an audio file. The audio file is transcribed into at least one text file by a transcription server. Homogenous speech segments are identified within the at least one text file. The audio file is segmented into homogenous audio segments that correspond to the identified homogenous speech segments. The homogenous audio segments of the audio file are separated into a first speaker audio file and second speaker audio file the first speaker audio file and the second speaker audio file are transcribed to produce a diarized transcript.
-
Citations
18 Claims
-
1. A method of producing a diarized transcript from a digital audio file, the method comprising:
-
obtaining a digital audio file; splitting the digital audio file into a plurality of frames; segmenting the digital audio file into entropy segments based upon an entropy of each frame; performing a blind diarization to identify a first speaker audio file and a second speaker audio file by clustering the entropy segments into the first speaker audio file and the second speaker audio file, wherein the first speaker audio file only contains audio attributed to the first speaker and the second speaker audio file only contains audio attributed to the second speaker; and identifying one of the first speaker audio file and second speaker audio file as an agent audio file and another of the first speaker audio file and the second speaker audio file as a customer audio file; and transcribing the agent audio file and the customer audio file to produce a diarized transcript. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory computer-readable medium having stored thereon a sequence of instructions that when executed by a computing system causes, the computing system to perform the steps comprising:
-
obtaining a digital audio file; splitting the digital audio file into a plurality of frames; segmenting the digital audio file into entropy segments based upon an entropy of each frame; performing a blind diarization to identify a first speaker audio file and a second speaker audio file by clustering the entropy segments into the first speaker audio file and the second speaker audio file, wherein the first speaker audio file only contains audio attributed to the first speaker and the second speaker audio file only contains audio attributed to the second speaker; and identifying one of the first speaker audio file and second speaker audio file as an agent audio file and another of the first speaker audio file and the second speaker audio file as a customer audio file; and transcribing the agent audio file and the customer audio file to produce a diarized transcript. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system for audio diarization, the system comprising:
-
a blind diarization module operating on a computer processor, wherein the blind diarization the blind diarization module is configured to receive audio data, split the audio data into a plurality of frames, segment the audio data into entropy segments based upon an entropy of each frame, and cluster the entropy segments into a first plurality of segments of the audio data as a first speaker audio file and a second plurality of segments of the audio data as a second speaker audio file; an agent diarization module operating on the computer processor, the agent diarization module receives an agent model, the agent diarization module compares the agent model to the first speaker audio file and the second speaker audio file and identifies one of the first and second speaker audio files as an agent audio file and an other of the first and second speaker audio files as a customer audio file; and a transcription server that receives the agent audio file and the customer audio file, and transcribes the audio files to produce a diarized transcript. - View Dependent Claims (16, 17, 18)
-
Specification