Diarization using linguistic labeling

US 10,522,153 B2
Filed: 10/25/2018
Issued: 12/31/2019
Est. Priority Date: 11/21/2012
Status: Active Grant

First Claim

Patent Images

1. A method of diarization, the method comprising:

receiving a set of textual transcripts from a transcription server and a set of audio files associated with the set of textual transcripts from an audio database server;

performing a blind diarization on the set of textual transcripts and the set of audio files to segment and cluster the textual transcripts into a plurality of textual speaker clusters, wherein the number of textual speaker clusters is at least equal to a number of speakers in the textual transcript;

automatedly applying at least one heuristic to the textual speaker clusters with a processor to select textual speaker clusters likely to be associated with an identified group of speakers, wherein the at least one heuristic is a comparison of a plurality of scripts associated with the identified group of speakers to each set of the textual speaker clusters and a correlation score between each of the textual speaker clusters and the plurality of scripts is calculated and the speaker cluster in each set with the greatest correlation score is selected as being the transcript likely to be associated with the identified group of speakers;

analyzing the selected textual speaker clusters with the processor to create at least one linguistic model, wherein the analysis includes determining word use frequencies for words in the selected textual speaker clusters with the processor, determining word use frequencies for words in the non-selected textual speaker clusters with the processor, and comparing the word use frequencies for words in the selected transcripts to the word use frequencies for words in the non-selected transcripts with the processor to identify a plurality of discriminating words for use in the at least one linguistic model; and

applying the linguistic model to transcribed audio data with the processor to label a portion of the transcribed audio data as having been spoken by the identified group of speakers.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.

139 Citations

16 Claims

1. A method of diarization, the method comprising:
- receiving a set of textual transcripts from a transcription server and a set of audio files associated with the set of textual transcripts from an audio database server;
  
  performing a blind diarization on the set of textual transcripts and the set of audio files to segment and cluster the textual transcripts into a plurality of textual speaker clusters, wherein the number of textual speaker clusters is at least equal to a number of speakers in the textual transcript;
  
  automatedly applying at least one heuristic to the textual speaker clusters with a processor to select textual speaker clusters likely to be associated with an identified group of speakers, wherein the at least one heuristic is a comparison of a plurality of scripts associated with the identified group of speakers to each set of the textual speaker clusters and a correlation score between each of the textual speaker clusters and the plurality of scripts is calculated and the speaker cluster in each set with the greatest correlation score is selected as being the transcript likely to be associated with the identified group of speakers;
  
  analyzing the selected textual speaker clusters with the processor to create at least one linguistic model, wherein the analysis includes determining word use frequencies for words in the selected textual speaker clusters with the processor, determining word use frequencies for words in the non-selected textual speaker clusters with the processor, and comparing the word use frequencies for words in the selected transcripts to the word use frequencies for words in the non-selected transcripts with the processor to identify a plurality of discriminating words for use in the at least one linguistic model; and
  
  applying the linguistic model to transcribed audio data with the processor to label a portion of the transcribed audio data as having been spoken by the identified group of speakers.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, further comprising saving the at least one linguistic model to a linguistic database server and associating it with the labeled speaker.
  - 3. The method of claim 2, further comprising applying the saved at least one linguistic model from the linguistic database server to a new audio file transcript from an audio source to perform diarization of the new audio file by blind diarizing the new audio file, comparing each new textual speaker cluster to the at least one linguistic model, and labeling each textual speaker cluster as belonging to a customer service agent or belonging to a customer.
  - 4. The method of claim 1, wherein the textual speaker clusters are associated in groups of at least two, wherein the group of at least two includes a textual speaker cluster originating from the identified group of speakers and at least one textual speaker cluster originating from an other speaker, and wherein the non-selected speaker clusters are assumed to have originated from the other speaker.
  - 5. The method of claim 1, wherein the identified group of speakers are customer service agents and the audio files are audio files of a customer service interaction between at least one customer service agent and at least one customer.
  - 6. The method of claim 1, further comprising:
    - receiving a set of recorded audio files; and
      
      transcribing the set of recorded audio files to produce the set of textual transcripts.
  - 7. The method of claim 1, wherein the at least one heuristic is detection of a script associated with the identified group of speakers.
  - 8. The method of claim 1, further comprising:
    - calculating a difference between the word use frequencies for each word in the selected transcripts and the non-selected transcripts; and
      
      comparing the difference to a predetermined selection threshold, wherein if the difference is greater than the predetermined selection threshold, the word is identified as a discriminating word.

9. A non-transitory computer-readable medium having instructions stored thereon for facilitating diarization of audio files from a customer service interaction, wherein the instructions, when executed by a processing system, direct the processing system to:
- receive a set of textual transcripts from a transcription server;
  
  receive a set of audio files associated with the set of textual transcripts from an audio database server;
  
  perform a blind diarization on the set of textual transcripts and the set of audio files to segment and cluster the textual transcripts into a plurality of textual speaker clusters, wherein the number of textual speaker clusters is at least equal to a number of speakers in the textual transcript;
  
  automatedly apply at least one heuristic to the textual speaker clusters with a processor to select textual speaker clusters likely to be associated with an identified group of speakers, wherein the at least one heuristic is a comparison of a plurality of scripts associated with the identified group of speakers to each set of the textual speaker clusters and a correlation score between each of the textual speaker clusters and the plurality of scripts is calculated and the speaker cluster in each set with the greatest correlation score is selected as being the transcript likely to be associated with the identified group of speakers;
  
  analyze the selected textual speaker clusters with the processor to create at least one linguistic model, wherein the analysis includes determining word use frequencies for words in the selected textual speaker clusters with the processor, determining word use frequencies for words in the non-selected textual speaker clusters with the processor, and comparing the word use frequencies for words in the selected transcripts to the word use frequencies for words in the non-selected transcripts with the processor to identify a plurality of discriminating words for use in the at least one linguistic model; and
  
  apply the linguistic model to transcribed audio data with the processor to label a portion of the transcribed audio data as having been spoken by the identified group of speakers.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The non-transitory computer-readable medium of claim 9, further directing the processing system to save the at least one linguistic model to a linguistic database server and associating it with the labeled speaker.
  - 11. The non-transitory computer-readable medium of claim 9, further directing the processing system to apply the saved at least one linguistic model from the linguistic database server to a new audio file transcript from an audio source to perform diarization of the new audio file by blind diarizing the new audio file, comparing each new textual speaker cluster to the at least one linguistic model, and labeling each textual speaker cluster as belonging to a customer service agent or belonging to a customer.
  - 12. The non-transitory computer-readable medium of claim 9, wherein the textual speaker clusters are associated in groups of at least two, wherein the group of at least two includes a textual speaker cluster originating from the identified group of speakers and at least one textual speaker cluster originating from an other speaker, and wherein the non-selected speaker clusters are assumed to have originated from the other speaker.
  - 13. The non-transitory computer-readable medium of claim 9, wherein the identified group of speakers are customer service agents and the audio files are audio files of a customer service interaction between at least one customer service agent and at least one customer.
  - 14. The non-transitory computer-readable medium of claim 9, further directing the processing system to:
    - receive a set of recorded audio files; and
      
      transcribe the set of recorded audio files to produce the set of textual transcripts.
  - 15. The non-transitory computer-readable medium of claim 9, wherein the at least one heuristic is detection of a script associated with the identified group of speakers.
  - 16. The non-transitory computer-readable medium of claim 9, further directing the processing system to:
    - calculate a difference between the word use frequencies for each word in the selected transcripts and the non-selected transcripts; and
      
      compare the difference to a predetermined selection threshold, wherein if the difference is greater than the predetermined selection threshold, the word is identified as a discriminating word.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Verint Systems Incorporated
Original Assignee
Verint Systems Limited (Verint Systems Incorporated)
Inventors
Ziv, Omer, Achituv, Ran, Shapira, Ido, Dreyfuss, Jeremie
Primary Examiner(s)
Opsasnick, Michael N

Application Number

US16/170,289
Publication Number

US 20190066691A1
Time in Patent Office

432 Days
Field of Search

704235
US Class Current
CPC Class Codes

G10L 17/00 Speaker identification or v...

G10L 17/02 Preprocessing operations, e...

Diarization using linguistic labeling

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

139 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Diarization using linguistic labeling

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

139 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links