Speaker recognition in the call center

US 10,679,630 B2
Filed: 06/14/2019
Issued: 06/09/2020
Est. Priority Date: 09/19/2016
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

extracting, by a computer, a first set of audio features from a first audio signal containing a first utterance according to a speaker diarization algorithm, the first audio signal received from an enrollee electronic device;

extracting, by the computer, a second set of audio features from a second audio signal containing a second utterance according to the speaker diarization algorithm, the second audio signal received from a caller electronic device;

performing, by the computer, a phonetic and acoustic comparison of the first utterance and the second utterance based upon the first set of audio features and the second set of audio features; and

determining, by the computer based upon the phonetic and acoustic comparison, at least a partial keyword sequence match and a speaker match between the first utterance and the second utterance.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Utterances of at least two speakers in a speech signal may be distinguished and the associated speaker identified by use of diarization together with automatic speech recognition of identifying words and phrases commonly in the speech signal. The diarization process clusters turns of the conversation while recognized special form phrases and entity names identify the speakers. A trained probabilistic model deduces which entity name(s) correspond to the clusters.

90 Citations

View as Search Results

20 Claims

1. A computer-implemented method comprising:
- extracting, by a computer, a first set of audio features from a first audio signal containing a first utterance according to a speaker diarization algorithm, the first audio signal received from an enrollee electronic device;
  
  extracting, by the computer, a second set of audio features from a second audio signal containing a second utterance according to the speaker diarization algorithm, the second audio signal received from a caller electronic device;
  
  performing, by the computer, a phonetic and acoustic comparison of the first utterance and the second utterance based upon the first set of audio features and the second set of audio features; and
  
  determining, by the computer based upon the phonetic and acoustic comparison, at least a partial keyword sequence match and a speaker match between the first utterance and the second utterance.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The computer-implemented method of claim 1, wherein the first set of audio features and the second set of audio features comprise at least one of mel-frequency cepstral coefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), or perceptual linear prediction (PLP).
  - 3. The computer-implemented method of claim 1, wherein the step of performing the phonetic and acoustic comparison further comprises:
    - executing, by the computer, a modified dynamic time warping process on at least a portion of the first audio signal; and
      
      executing, by the computer, the modified dynamic time warping process on at least a portion of the second audio signal.
  - 4. The computer-implemented method of claim 3, further comprising:
    - time hopping, by the computer, on the portion of the first audio signal during the modified dynamic time warping process based upon a first hop size of the first set of audio features; and
      
      time hopping, by the computer, on the portion of the second audio signal during the modified dynamic time warping process based upon a second hop size of the second set of audio features.
  - 5. The computer-implemented method of claim 4, wherein the partial keyword sequence match includes a first set of one or more keywords in the first utterance and a second set of one or more keywords in the second utterance, the first set of one or more keywords being the same as the second set of one or more keywords.
  - 6. The computer-implemented method of claim 5, wherein the first set of one or more keywords in the first utterance begins at a different time than the second set of one or more keywords in the second utterance.
  - 7. The computer-implemented method of claim 1, wherein the first audio signal is an enrollment sample.
  - 8. The computer-implemented method of claim 1, wherein the second audio signal is a test sample.
  - 9. The computer-implemented method of claim 1, wherein the second utterance in the second audio signal is from a speaker to be identified.
  - 10. The computer-implemented method of claim 9, further comprising:
    - identifying, by the computer, the speaker based upon the partial keyword sequence match and the speaker match.

11. A system comprising:
- a non-transitory storage medium storing a plurality of computer program instructions; and
  
  a processor electrically coupled to the non-transitory storage medium and configured to execute the computer program instructions to;
  
  extract a first set of audio features from a first audio signal containing a first utterance according to a speaker diarization algorithm, the first audio signal received from an enrollee electronic device;
  
  extract a second set of audio features from a second audio signal containing a second utterance according to the speaker diarization algorithm, the second audio signal received from a caller electronic device;
  
  perform a phonetic and acoustic comparison of the first utterance and the second utterance based upon the first set of audio features and the second set of audio features; and
  
  determine based upon the phonetic and acoustic comparison, at least a partial keyword sequence match and a speaker match between the first utterance and the second utterance.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The system of claim 11, wherein the first set of audio features and the second set of audio features comprise at least one of mel-frequency cepstral coefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), or perceptual linear prediction (PLP).
  - 13. The system of claim 11, wherein the processor is configured to further execute the computer program instructions to:
    - execute a modified dynamic time warping process on at least a portion of the first audio signal; and
      
      execute the modified dynamic time warping process on at least a portion of the second audio signal.
  - 14. The system of claim 13, wherein the processor is configured to further execute the computer program instructions to:
    - time hop on the portion of the first audio signal during the modified dynamic time warping process based upon a first hop size of the first set of audio features; and
      
      time hop on the portion of the second audio signal during the modified dynamic time warping process based upon a second hop size of the second set of audio features.
  - 15. The system of claim 14, wherein the partial keyword sequence match includes a first set of one or more keywords in the first utterance and a second set of one or more keywords in the second utterance, the first set of one or more keywords being the same as the second set of one or more keywords.
  - 16. The system of claim 15, wherein the first set of one or more keywords in the first utterance begins at a different time than the second set of one or more keywords in the second utterance.
  - 17. The system of claim 11, wherein the first audio signal is an enrollment sample.
  - 18. The system of claim 11, wherein the second audio signal is a test sample.
  - 19. The system of claim 11, wherein the second utterance in the second audio signal is from a speaker to be identified.
  - 20. The system of claim 19, wherein the processor is configured to further execute the computer program instructions to:
    - identify the speaker based upon the partial keyword sequence match and the speaker match.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Pindrop Security, Inc.
Original Assignee
Pindrop Security, Inc.
Inventors
Khoury, Elie, Garland, Matthew
Primary Examiner(s)
Guerra-Erazo, Edgar X

Application Number

US16/442,368
Publication Number

US 20190304468A1
Time in Patent Office

361 Days
Field of Search
US Class Current
CPC Class Codes

G06N 7/01   Probabilistic graphical mod...

G10L 15/07   to the speaker

G10L 15/19   Grammatical context, e.g. d...

G10L 15/26   Speech to text systems G10L...

G10L 17/00   Speaker identification or v...

G10L 17/04   Training, enrolment or mode...

G10L 17/08   Use of distortion metrics o...

G10L 17/24   the user being prompted to ...

H04M 1/271   controlled by voice recogni...

H04M 2203/40   related to call centers

Speaker recognition in the call center

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

90 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speaker recognition in the call center

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

90 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links