Diarization using linguistic labeling
First Claim
1. A method of diarization, the method comprising:
- receiving a set of textual transcripts from a transcription server and a set of audio files associated with the set of textual transcripts from an audio database server;
performing a blind diarization on the set of textual transcripts and the set of audio files to segment and cluster the textual transcripts into a plurality of textual speaker clusters, wherein the number of textual speaker clusters is at least equal to a number of speakers in the textual transcript;
automatedly applying at least one heuristic to the textual speaker clusters with a processor to select textual speaker clusters likely to be associated with an identified group of speakers, wherein the at least one heuristic is a comparison of a plurality of scripts associated with the identified group of speakers to each set of the textual speaker clusters and a correlation score between each of the textual speaker clusters and the plurality of scripts is calculated and the speaker cluster in each set with the greatest correlation score is selected as being the transcript likely to be associated with the identified group of speakers;
analyzing the selected textual speaker clusters with the processor to create at least one linguistic model, wherein the analysis includes determining word use frequencies for words in the selected textual speaker clusters with the processor, determining word use frequencies for words in the non-selected textual speaker clusters with the processor, and comparing the word use frequencies for words in the selected transcripts to the word use frequencies for words in the non-selected transcripts with the processor to identify a plurality of discriminating words for use in the at least one linguistic model; and
applying the linguistic model to transcribed audio data with the processor to label a portion of the transcribed audio data as having been spoken by the identified group of speakers.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.
139 Citations
16 Claims
-
1. A method of diarization, the method comprising:
-
receiving a set of textual transcripts from a transcription server and a set of audio files associated with the set of textual transcripts from an audio database server; performing a blind diarization on the set of textual transcripts and the set of audio files to segment and cluster the textual transcripts into a plurality of textual speaker clusters, wherein the number of textual speaker clusters is at least equal to a number of speakers in the textual transcript; automatedly applying at least one heuristic to the textual speaker clusters with a processor to select textual speaker clusters likely to be associated with an identified group of speakers, wherein the at least one heuristic is a comparison of a plurality of scripts associated with the identified group of speakers to each set of the textual speaker clusters and a correlation score between each of the textual speaker clusters and the plurality of scripts is calculated and the speaker cluster in each set with the greatest correlation score is selected as being the transcript likely to be associated with the identified group of speakers; analyzing the selected textual speaker clusters with the processor to create at least one linguistic model, wherein the analysis includes determining word use frequencies for words in the selected textual speaker clusters with the processor, determining word use frequencies for words in the non-selected textual speaker clusters with the processor, and comparing the word use frequencies for words in the selected transcripts to the word use frequencies for words in the non-selected transcripts with the processor to identify a plurality of discriminating words for use in the at least one linguistic model; and applying the linguistic model to transcribed audio data with the processor to label a portion of the transcribed audio data as having been spoken by the identified group of speakers. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A non-transitory computer-readable medium having instructions stored thereon for facilitating diarization of audio files from a customer service interaction, wherein the instructions, when executed by a processing system, direct the processing system to:
-
receive a set of textual transcripts from a transcription server; receive a set of audio files associated with the set of textual transcripts from an audio database server; perform a blind diarization on the set of textual transcripts and the set of audio files to segment and cluster the textual transcripts into a plurality of textual speaker clusters, wherein the number of textual speaker clusters is at least equal to a number of speakers in the textual transcript; automatedly apply at least one heuristic to the textual speaker clusters with a processor to select textual speaker clusters likely to be associated with an identified group of speakers, wherein the at least one heuristic is a comparison of a plurality of scripts associated with the identified group of speakers to each set of the textual speaker clusters and a correlation score between each of the textual speaker clusters and the plurality of scripts is calculated and the speaker cluster in each set with the greatest correlation score is selected as being the transcript likely to be associated with the identified group of speakers; analyze the selected textual speaker clusters with the processor to create at least one linguistic model, wherein the analysis includes determining word use frequencies for words in the selected textual speaker clusters with the processor, determining word use frequencies for words in the non-selected textual speaker clusters with the processor, and comparing the word use frequencies for words in the selected transcripts to the word use frequencies for words in the non-selected transcripts with the processor to identify a plurality of discriminating words for use in the at least one linguistic model; and apply the linguistic model to transcribed audio data with the processor to label a portion of the transcribed audio data as having been spoken by the identified group of speakers. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
Specification