METHOD AND APPARATUS FOR LARGE POPULATION SPEAKER IDENTIFICATION IN TELEPHONE INTERACTIONS

US 20080195387A1
Filed: 10/19/2006
Published: 08/14/2008
Est. Priority Date: 10/19/2006
Status: Active Grant

First Claim

Patent Images

1. A method for determining whether a speaker uttering a tested utterance belongs to a predetermined set comprising an at least one known speaker, wherein an at least one training utterance is available for each of the at least one known speaker, the method comprising the steps of:

extracting an at least one first feature of each of the at least one training utterance;

estimating an at least one model from the at least one first feature;

extracting an at least one second feature from an at least one frame of the tested utterance;

scoring the at least one second feature against an at least one of the at least one model, to obtain an at least one intermediate score;

determining an at least one model score using the at least one intermediate score;

selecting an at least one maximal score from the at least one model score; and

the speaker is determined to belong to the predetermined set, if the at least one maximal score exceeds a threshold.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for determining whether a speaker uttering an utterance belongs to a predetermined set comprising known speakers, wherein a training utterance is available for each known speaker. The method and apparatus test whether features extracted from the tested utterance provide a score exceeding a threshold when matched against one or more of models constructed upon voice samples of each known speaker. The method and system further provide optional enhancements such as determining, using, and updating model normalization parameters, a fast scoring algorithm, summed calls handling, or quality evaluation for the tested utterance.

295 Citations

38 Claims

1. A method for determining whether a speaker uttering a tested utterance belongs to a predetermined set comprising an at least one known speaker, wherein an at least one training utterance is available for each of the at least one known speaker, the method comprising the steps of:
- extracting an at least one first feature of each of the at least one training utterance;
  
  estimating an at least one model from the at least one first feature;
  
  extracting an at least one second feature from an at least one frame of the tested utterance;
  
  scoring the at least one second feature against an at least one of the at least one model, to obtain an at least one intermediate score;
  
  determining an at least one model score using the at least one intermediate score;
  
  selecting an at least one maximal score from the at least one model score; and
  
  the speaker is determined to belong to the predetermined set, if the at least one maximal score exceeds a threshold.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
- - 2. The method of claim 1 wherein the threshold is predetermined.
  - 3. The method of claim 1 wherein the threshold is post determined.
  - 4. The method of claim 1 further comprising a step of generating an alert when the speaker is determined to belong to the predetermined set.
  - 5. The method of claim 1 wherein there are at least two voices belonging to at least two speakers in the tested utterance or the training utterance.
  - 6. The method of claim 5 further comprising a step of summed call handling for separating the tested utterance or the training utterance to at least two sides.
  - 7. The method of claim 6 further comprising a step of detecting an agent side within the at least two sides.
  - 8. The method of claim 1 further comprising a step of building an at least one background model.
  - 9. The method of claim 8 further comprising the steps of:
    - scoring the at least one second feature against the at least one background model to obtain an at least one background score; and
      
      integrating the at least one model score with the at least one background score to enhance the model score.
  - 10. The method of claim 1 further comprising a step of scaling the maximal score to be in a predetermined range.
  - 11. The method of claim 1 further comprising a step of gender detection for the speaker or for the at least one known speaker.
  - 12. The method of claim 1 further comprising the steps of:
    - extracting an at least one third feature from a part of the tested utterance;
      
      scoring the at least one third feature against each of the at least one model to obtain an at least one temporary model score;
      
      selecting an at least one temporary maximal score from the at least one temporary model score, the at least one temporary maximal score associated with an at least one model to be checked;
      
      scoring the at least one second feature against the at least one model to be checked, to obtain the at least one model score.
  - 13. The method of claim 1 further comprising the steps of:
    - determining an at least one normalization parameter associated with each of the at least one model; and
      
      normalizing the at least one model score using the at least one normalization parameter.
  - 14. The method of claim 13 further comprising the step of updating the at least one normalization parameter using the at least one model score.
  - 15. The method of claim 13 wherein the at least one normalization parameter is determined from at least two voice utterances uttered by none of the at least one known speaker.
  - 16. The method of claim 13 wherein the at least one normalization parameter comprises the mean or the standard deviation of at least two scores associated with the at least one model.
  - 17. The method of claim 13 wherein the at least one normalization parameter is determined from at least two model scores exceeding a second threshold.
  - 18. The method of claim 13 wherein the at least one normalization parameter is determined from at least two model scores having a score belonging to a predetermined top percentage of a multiplicity of model scores.
  - 19. The method of claim 1 further comprising a step of evaluating an at least one scalar measure related to the tested utterance.
  - 20. The method of claim 1 wherein the model score is determined based on an at least one intermediate score having a value higher than at least one second intermediate score.
  - 21. The method of claim 1 further comprising a preprocessing step for segmenting the tested utterance or the at least one training utterance and discarding an at least one frame of the tested utterance or the at least one training utterance.
  - 22. The method of claim 1 when used within a fraud detection method.
  - 23. The method of claim 1 further comprising:
    - a speaker verification step, for issuing a probability that the speaker is who he is claiming to be; and
      
      a total fraud risk determination step for determining a total fraud risk based on the determination that the speaker belongs to the predetermined set and on the probability that the speaker is who he is claiming to be.

24. An apparatus for determining whether a speaker uttering a tested utterance belongs to a predetermined set comprising an at least one known speaker, wherein an at least one training utterance is available for each of the at least one known speaker, the apparatus comprising:
- a feature extraction component for extracting an at least one first feature of the tested utterance or of each of the at least one training utterance;
  
  a frame scoring component for scoring an at least one feature against an at least one model, to obtain an at least one intermediate score;
  
  a total model scoring component for determining an at least one model score using the at least one intermediate score; and
  
  a maximal score determination component for selecting a maximal score from the at least one model score.
- View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
- - 25. The apparatus of claim 24 further comprising a model estimation component for estimating the at least one model.
  - 26. The apparatus of claim 24 further comprising a summed call handling component for separating the tested utterance or the training utterance to at least two sides.
  - 27. The apparatus of claim 26 further comprising an agent detection component for detecting an agent side within the at least two sides.
  - 28. The apparatus of claim 27 further comprising a score scaling component for scaling the maximal score to be in a predetermined range.
  - 29. The apparatus of claim 24 further comprising a gender detection component for determining whether a speaker in the tested utterance or in the at least one training utterance is a male or a female.
  - 30. The apparatus of claim 24 further comprising a model normalization parameter initialization component, for evaluating an at least one parameter associated with a model.
  - 31. The apparatus of claim 30 further comprising a model normalization parameter update component, for updating the at least one parameter associated with a model.
  - 32. The apparatus of claim 30 further comprising a score adaptation component for adapting an at least one model score using an at least one model normalization parameter.
  - 33. The apparatus of claim 24 further comprising a decision component for determining according to the maximal score whether the speaker uttering the tested utterance belongs to the predetermined set.
  - 34. The apparatus of claim 24 further comprising a quality evaluation component for determining an at least one quality measure associated with an at least one part of the tested utterance.
  - 35. The apparatus of claim 24 further comprising an alert generation component for issuing an alert when the speaker uttering the tested utterance is determined to belong to the predetermined set.
  - 36. The apparatus of claim 24 further comprising a capturing or logging component for capturing or logging the tested utterance.
  - 37. The apparatus of claim 24 further comprising a fraud detection component for determining a probability that the speaker is a fraudster.

38. A computer readable storage medium containing a set of instructions for a general purpose computer, the set of instructions comprising:
- extracting an at least one first feature of each of an at least one training utterance, each of the at least one training utterance uttered by a person belonging to a target set;
  
  estimating an at least one model from the at least one first feature, the model associated with the person belonging to the target set;
  
  extracting an at least one second feature from an at least one frame of a tested utterance;
  
  scoring the at least one second feature against the at least one model, to obtain an at least one intermediate score;
  
  determining a model score using the at least one intermediate score;
  
  selecting a maximal score from the at least one model score; and
  
  determining whether a speaker of the tested utterance belongs to the target set.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nice Ltd
Original Assignee
Nice Systems Limited (Nice Ltd)
Inventors
WASSERBLAT, Moshe, ZIGEL, Yaniv

Granted Patent

US 7,822,605 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/236
CPC Class Codes

G10L 17/06 Decision making techniques;...

METHOD AND APPARATUS FOR LARGE POPULATION SPEAKER IDENTIFICATION IN TELEPHONE INTERACTIONS

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

295 Citations

38 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD AND APPARATUS FOR LARGE POPULATION SPEAKER IDENTIFICATION IN TELEPHONE INTERACTIONS

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

295 Citations

38 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links