Speaker recognition in the call center

US 10,325,601 B2
Filed: 09/19/2017
Issued: 06/18/2019
Est. Priority Date: 09/19/2016
Status: Active Grant

First Claim

Patent Images

1. A method for distinguishing and identifying at least one of multiple speakers in a speech signal, the method comprising:

obtaining, by a computer, the speech signal, the speech signal including utterances respectively from at least a first speaker and a second speaker;

extracting, by the computer, speech portions from the speech signal;

performing, by the computer, speaker diarization on the speech portions, the speaker diarization identifying speech portions respectively associated with at least one of the first speaker and the second speaker;

detecting, by the computer, at least one trigger phrase in the respective speech portions by using automatic speech recognition, each possible trigger phrase being associated with a respective prior probability that a current, next or previous utterance in the respective speech portion is an entity name;

detecting, by the computer, at least one entity name in the respective speech portions by using automatic speech recognition; and

identifying, by the computer, at least one of the first speaker and the second speaker by executing a probabilistic model trained to associate at least one of the first speaker and the second speaker with a respective one of the at least one detected entity names, the probabilistic model having been trained based on at least the diarized speech portions, the detected at least one trigger phrase, and the detected at least one entity name.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Utterances of at least two speakers in a speech signal may be distinguished and the associated speaker identified by use of diarization together with automatic speech recognition of identifying words and phrases commonly in the speech signal. The diarization process clusters turns of the conversation while recognized special form phrases and entity names identify the speakers. A trained probabilistic model deduces which entity name(s) correspond to the clusters.

Citations

9 Claims

1. A method for distinguishing and identifying at least one of multiple speakers in a speech signal, the method comprising:
- obtaining, by a computer, the speech signal, the speech signal including utterances respectively from at least a first speaker and a second speaker;
  
  extracting, by the computer, speech portions from the speech signal;
  
  performing, by the computer, speaker diarization on the speech portions, the speaker diarization identifying speech portions respectively associated with at least one of the first speaker and the second speaker;
  
  detecting, by the computer, at least one trigger phrase in the respective speech portions by using automatic speech recognition, each possible trigger phrase being associated with a respective prior probability that a current, next or previous utterance in the respective speech portion is an entity name;
  
  detecting, by the computer, at least one entity name in the respective speech portions by using automatic speech recognition; and
  
  identifying, by the computer, at least one of the first speaker and the second speaker by executing a probabilistic model trained to associate at least one of the first speaker and the second speaker with a respective one of the at least one detected entity names, the probabilistic model having been trained based on at least the diarized speech portions, the detected at least one trigger phrase, and the detected at least one entity name.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, wherein the probabilistic model is a Bayesian Network.
  - 3. The method of claim 1, further comprising:
    - performing, by the computer, the speaker diarization to identify speech portions respectively associated with each of the first speaker and the second speaker, wherein the probabilistic model is trained to associate each of the first speaker and the second speaker with a respective one of the detected entity names.
  - 4. The method of claim 3, wherein said performing speaker diarization includes:
    - periodically extracting, by the computer, i-vectors from the speech portions;
      
      partitionally clustering, by the computer, the i-vectors into respective clusters for the first speaker and the second speaker; and
      
      executing, by the computer, the probabilistic model to associate an identification of at least one of the first speaker and the second speaker with a corresponding one of the respective clusters.
  - 5. The method of claim 3, wherein the first speaker is a call center agent, and the second speaker is a caller.

6. An apparatus for distinguishing and identifying at least one of multiple speakers in a speech signal, the apparatus comprising:
- a computer configured to;
  
  receive a speech signal having utterances from at least a first speaker and a second speaker;
  
  extract speech portions of the speech signal;
  
  segregate the extracted speech portions into speech portions associated with at least one of the first speaker and the second speaker;
  
  recognize one or more predetermined trigger phrases in each speech portion, each possible trigger phrase being associated with a respective prior probability that a current, next or previous utterance in the respective speech portion is a named entity and to recognized one or more entity names each respective speech portion; and
  
  identify at least one of the first speaker and the second speaker by executing a probabilistic model trained to associate a recognized entity name with speech portions associated with at least one of the first speaker and the second speaker.
- View Dependent Claims (7, 8, 9)
- - 7. The apparatus of claim 6, wherein the computer is further configured to:
    - identify speech portions respectively associated with each of the first speaker and the second speaker, wherein the probabilistic model is trained to associate each of the first speaker and the second speaker with a respective one of the detected entity names.
  - 8. The apparatus of claim 7, wherein the computer is further configured to periodically extract i-vectors from the speech portion, and to partitionally cluster the i-vectors into respective clusters for the first speaker and the second speaker.
  - 9. The apparatus of claim 7, wherein the first speaker is a call center agent, and the second speaker is a caller.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Pindrop Security, Inc.
Original Assignee
Pindrop Security, Inc.
Inventors
Khoury, Elie, Garland, Matthew
Primary Examiner(s)
Guerra-Erazo, Edgar X

Application Number

US15/709,290
Publication Number

US 20180082689A1
Time in Patent Office

637 Days
Field of Search
US Class Current
CPC Class Codes

G06N 7/01   Probabilistic graphical mod...

G10L 15/07   to the speaker

G10L 15/19   Grammatical context, e.g. d...

G10L 15/26   Speech to text systems G10L...

G10L 17/00   Speaker identification or v...

G10L 17/04   Training, enrolment or mode...

G10L 17/08   Use of distortion metrics o...

G10L 17/24   the user being prompted to ...

H04M 1/271   controlled by voice recogni...

H04M 2203/40   related to call centers

Speaker recognition in the call center

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Speaker recognition in the call center

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links