Method and apparatus for segmentation of audio interactions

US 7,716,048 B2
Filed: 01/25/2006
Issued: 05/11/2010
Est. Priority Date: 01/25/2006
Status: Active Grant

First Claim

Patent Images

1. A speaker segmentation method for associating an at least one segment of speech for each of at least two sides of a summed audio interaction, with one of the at least two sides of the interaction, using additional information, the method comprising:

a receiving step for receiving the summed audio interaction from a capturing and logging unit;

a segmentation step for associating the at least one segment with one side of the summed audio interaction, the segmentation step comprisinga parameterization step for transforming a speech signal into a set of feature vectors and dividing the set into non-overlapping segments;

an anchoring step for locating an anchor segment for each of the at least two sides of the summed audio interaction, the anchoring step comprising;

selecting a homogenous segment as a first anchor segment;

constructing a first model of the homogenous segment; and

selecting a second anchor segment such that its model is different from the first model; and

a modeling and classification step for associating at least one second segment with each side of the summed audio interaction; and

a scoring step for assigning a score to said segmentation.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for segmenting an audio interaction, by locating anchor segment from each side of the interaction, iteratively classifying additional segments into one of the two sides, and scoring the resulting segmentation, If the score result is below a threshold, the process is repeated until the segmentation score is satisfactory or until a stopping criterion is met. The anchoring and the scoring steps comprise using additional data associated with the interaction, a speaker thereof, internal or external information related to the interaction or to a speaker thereof or the like.

Citations

20 Claims

1. A speaker segmentation method for associating an at least one segment of speech for each of at least two sides of a summed audio interaction, with one of the at least two sides of the interaction, using additional information, the method comprising:
- a receiving step for receiving the summed audio interaction from a capturing and logging unit;
  
  a segmentation step for associating the at least one segment with one side of the summed audio interaction, the segmentation step comprisinga parameterization step for transforming a speech signal into a set of feature vectors and dividing the set into non-overlapping segments;
  
  an anchoring step for locating an anchor segment for each of the at least two sides of the summed audio interaction, the anchoring step comprising;
  
  selecting a homogenous segment as a first anchor segment;
  
  constructing a first model of the homogenous segment; and
  
  selecting a second anchor segment such that its model is different from the first model; and
  
  a modeling and classification step for associating at least one second segment with each side of the summed audio interaction; and
  
  a scoring step for assigning a score to said segmentation.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The method of claim 1 wherein the additional information is at least one item selected from the group consisting of:
    - computer-telephony-integration information related to the summed audio interaction;
      
      spotted words within the summed audio interaction;
      
      data related to the summed audio interaction;
      
      data related to a speaker thereof;
      
      external data related to the summed audio interaction; and
      
      data related to at least one other interaction performed by a speaker of the summed audio interaction.
  - 3. The method of claim 1 further comprising a model association step for scoring the at least one segment against an at least one statistical model of one side, and obtaining a model association score.
  - 4. The method of claim 1 wherein the scoring step uses discriminative information for discriminating the at least two sides of the summed audio interaction.
  - 5. The method of claim 4 wherein the scoring step comprises a model association step for scoring the at least one segment against an at least one statistical model of one side, and obtaining a model association score.
  - 6. The method of claim 5 wherein the scoring step further comprises a normalization step for normalizing the at least one model score.
  - 7. The method of claim 4 wherein the scoring step comprises evaluating the association of the at least one segment with a side of the summed audio interaction using second additional information.
  - 8. The method of claim 7 wherein the second additional information is at least one item selected from of the group consisting of:
    - computer-telephony-integration information related to the summed audio interaction;
      
      spotted words within the summed audio interaction;
      
      data related to the summed audio interaction;
      
      data related to a speaker thereof;
      
      external data related to the summed audio interaction; and
      
      data related to at least one other interaction performed by a speaker of the summed audio interaction.
  - 9. The method of claim 1 wherein the scoring step comprises statistical scoring.
  - 10. The method of claim 1 further comprising:
    - a step of comparing said score to a threshold; and
      
      repeating the segmentation step and the scoring step if said score is below the threshold.
  - 11. The method of claim 10 wherein the threshold is predetermined, or dynamic, or depends on:
    - information associated with said summed audio interaction, information associated with an at least one speaker thereof or external information associated with the summed audio interaction.
  - 12. The method of claim 1 wherein the homogenous segment is selected by spotting a predetermined phrase.
  - 13. The method of claim 1 wherein the anchoring step or the modeling and classification step comprise using second additional data.
  - 14. The method of claim 13 wherein the second additional data is at least one item selected from the group consisting of:
    - computer-telephony-integration information related to the summed audio interaction;
      
      spotted words within the summed audio interaction;
      
      data related to the summed audio interaction;
      
      data related to a speaker thereof;
      
      external data related to the summed audio interaction; and
      
      data related to at least one other interaction performed by a speaker of the summed audio interaction.
  - 15. The method of claim 1 further comprising a preprocessing step for enhancing the quality of the summed audio interaction.
  - 16. The method of claim 1 further comprising a speech/non-speech segmentation step for eliminating non-speech segments from the summed audio interaction.
  - 17. The method of claim 1 wherein the segmentation step comprises scoring the at least one segment with a voice model of a known speaker.

18. A speaker segmentation apparatus for associating an at least one segment of speech for each of at least two speakers participating in an audio interaction, with a side of the interaction, using additional information, the apparatus comprising:
- a segmentation component for associating an at least one segment within the audio interaction with one side of the audio interaction, the segmentation component comprising;
  
  a parameterization component for transforming a speech signal into a set of feature vectors and dividing the set into non-overlapping segments;
  
  an anchoring component for locating an anchor segment for each of the at least two sides of the audio interaction, the anchoring component selecting a homogenous segment as a first anchor segment, and a second anchor segment having a statistical model different from a statistical model associated with the first anchor segment; and
  
  a modeling and classification component for associating at least one second segment with each side of the audio interaction; and
  
  a scoring component for assigning a score to said segmentation.
- View Dependent Claims (19)
- - 19. The apparatus of claim 18 wherein the additional information is at least one item selected from the group consisting of:
    - computer-telephony-integration information related to the audio interaction;
      
      spotted words within the audio interaction;
      
      data related to the audio interaction;
      
      data related to a speaker thereof;
      
      external data related to the audio interaction; and
      
      data related to at least one other interaction performed by a speaker of the audio interaction.

20. A quality management apparatus for interaction-rich speech environments, the apparatus comprising:
- a capturing or logging component for capturing or logging an at least one audio interaction in which at least two sides communicate;
  
  a segmentation component for segmenting the at least one audio interaction, the segmentation component comprising;
  
  a parameterization component for transforming a speech signal into a set of feature vectors and dividing the set into non-overlapping segments;
  
  an anchoring component for locating an anchor segment for each of the at least two sides of the at least one audio interaction, the anchoring component selecting a homogenous segment as a first anchor segment, and a second anchor segment having a statistical model different from a statistical model associated with the first anchor segment; and
  
  a modeling and classification component for associating at least one second segment with each side of the at least one audio interaction; and
  
  a playback component for playing an at least one part of the at least one audio interaction.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Daimler AG (Mercedes-Benz Group AG)
Original Assignee
Nice Systems Limited (Nice Ltd)
Inventors
Pereg, Oren, Waserblat, Moshe
Primary Examiner(s)
ABEBE, DANIEL DEMELASH

Application Number

US10/567,810
Publication Number

US 20080181417A1
Time in Patent Office

1,567 Days
Field of Search

704/246, 704/247, 704/248, 379/88.01
US Class Current

704/246
CPC Class Codes

G10L 17/00   Speaker identification or v...

G10L 25/00   Speech or voice analysis te...

H04H 60/58   of audio determination or d...

Method and apparatus for segmentation of audio interactions

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for segmentation of audio interactions

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links