Automated sorting of voice messages through speaker spotting

US 5,271,088 A
Filed: 04/07/1993
Issued: 12/14/1993
Est. Priority Date: 05/13/1991
Status: Expired due to Term

First Claim

Patent Images

1. In a method of automatically recognizing a speaker on a communication channel, including the steps of digitizing input speech signals into a series of frames of digital data representing the input speech, analyzing the speech frames by a speaker recognition module which compares the incoming speech to a reference set of speech features of a given group of different speakers obtained during prior training sessions and generates respective match score therefrom, and determining which speaker the input speech is identified with based upon the match scores with each speaker associated with at least one stored reference frame, in combination therewith, the improvement wherein:

said analysis of speech frames by said speaker recognition module is implemented through the use of a set of speech feature vectors to characterize a given speaker'"'"'s speech patterns, said speech feature vectors being non-parametric in nature andsaid comparison of incoming speech to reference speech features by said speaker recognition module includes generating a match score which is a sum of a Score_A set equal to the average of the minimum Euclidean squared distance between the unknown speech frame and all reference frames of a given speaker over all frames of the unknown input, and Score_B set equal to the average of the minimum Euclidean squared distance between each frame of the reference set to all frames of the unknown input, over all frames of the reference set of speech features,wherein the "distance" from u_j to the reference message R is;

##EQU15## and the "distance" from r_i to the unknown message U is;

##EQU16## wherein u_j is the j-th frame of unknown message U and r_i be the i-th frame of reference message R, and ##EQU17## and wherein said comparison of incoming speech to reference speech features includes a step of normalizing said match score with respect to said stored reference frame for all speakers to provide a normalized score and comparing all normalized scores for all speakers to select the speaker having a highest acceptable normalized score.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speaker recognition apparatus employs a non-parametric baseline algorithm for speaker recognition which characterizes a given speaker'"'"'s speech patterns by a set of speech feature vectors, and generates match scores which are sums of a ScoreA set equal to the average of the minimum Euclidean squared distance between the unknown speech frame and all reference frames of a given speaker over all frames of the unknown input, and ScoreB set equal to the average of the minimum Euclidean squared distance between each frame of the reference set to all frames of the unknown input. The performance on a queue of talkers is further improved by normalization of reference message match distances. The improved baseline algorithm addresses the co-channel problem of speaker spotting when plural speech signals are intermixed on the same channel by using a union of reference sets for pairs of speakers as the reference set for a co-channel signal, and/or by conversational state modelling.

Citations

8 Claims

1. In a method of automatically recognizing a speaker on a communication channel, including the steps of digitizing input speech signals into a series of frames of digital data representing the input speech, analyzing the speech frames by a speaker recognition module which compares the incoming speech to a reference set of speech features of a given group of different speakers obtained during prior training sessions and generates respective match score therefrom, and determining which speaker the input speech is identified with based upon the match scores with each speaker associated with at least one stored reference frame, in combination therewith, the improvement wherein:
- said analysis of speech frames by said speaker recognition module is implemented through the use of a set of speech feature vectors to characterize a given speaker'"'"'s speech patterns, said speech feature vectors being non-parametric in nature andsaid comparison of incoming speech to reference speech features by said speaker recognition module includes generating a match score which is a sum of a Score_A set equal to the average of the minimum Euclidean squared distance between the unknown speech frame and all reference frames of a given speaker over all frames of the unknown input, and Score_B set equal to the average of the minimum Euclidean squared distance between each frame of the reference set to all frames of the unknown input, over all frames of the reference set of speech features,wherein the "distance" from u_j to the reference message R is;
  
  ##EQU15## and the "distance" from r_i to the unknown message U is;
  
  ##EQU16## wherein u_j is the j-th frame of unknown message U and r_i be the i-th frame of reference message R, and ##EQU17## and wherein said comparison of incoming speech to reference speech features includes a step of normalizing said match score with respect to said stored reference frame for all speakers to provide a normalized score and comparing all normalized scores for all speakers to select the speaker having a highest acceptable normalized score.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. A method of speaker recognition according to claim 1, further including the step of normalizing the match scores relative to the reference group of speakers.
  - 3. A method of speaker recognition according to claim 1, further including the step of removing the effects of variations in reference message content by using the z score to remove the means and fix the variance of an unknown queue of match distances to each reference message.
  - 4. A method of speaker recognition according to claim 1, further including the step of controlling the sensitivity of the speaker recognition model by selecting speech frames for inclusion or exclusion depending upon the expected speech information contained therein.
  - 5. A method of speaker recognition according to claim 1, further including the step of normalizing the frames of input speech to remove the mean channel spectrum observed within the message, by the process known as blind deconvolution.
  - 6. A method of speaker recognition according to claim 1, further including the step of determining a speaker speaking in a conversation of a number of speakers by using a union of reference sets for pairs of talkers as the reference set for a co-channel signal.
  - 7. A method of message sorting according to claim 1, further including the step of using a non-parametric comparison of input speech messages to the reference set for the given group of speakers, wherein the reference set characterizes a given speaker'"'"'s speech patterns by a non-parametric set of speech feature vectors.

8. In a method of automatically sorting voice messages transmitted on a communication channel, including the steps of placing input speech messages in a queue, digitizing the input speech messages into a series of frames of digital data representing the input speech messages, analyzing the speech frames by a speaker recognition module which compares each incoming speech message to a reference set of speech features of a given group of different speakers obtained during prior training sessions and generates respective match score therefrom with each speaker associated with at least one stored reference frame, and determining which speaker the input speech message is identified with based upon the match scores, in combination therewith, the improvement wherein:
- said analysis of speech frames by said speaker recognition module is implemented through the use of a set of speech feature vectors to characterize a given speaker'"'"'s speech patterns, said speech feature vectors being non-parametric in nature andsaid comparison of incoming speech to reference speech features by said speaker recognition module includes generating a match score which is a sum of a Score_A set equal to the average of the minimum Euclidean squared distance between the unknown speech frame and all reference frames of a given speaker over all frames of the unknown input, and Score_B set equal to the average of the minimum Euclidean squared distance between each frame of the reference set to all frames of the unknown input, over all frames of the reference set of speech features,wherein the "distance" from u_j to the reference message R is;
  
  ##EQU18## and the "distance" from r_i to the unknown message U is;
  
  ##EQU19## wherein u_j is the j-th frame of unknown message U and r_i be the i-th frame of reference message R, andand wherein said comparison of incoming speech to reference speech features includes a step of normalizing said match score with respect to said stored reference frame for all speakers to provide a normalized score and comparing all normalized scores for all speakers to select the speaker having a highest acceptable normalized score.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
ITT Corporation (ITT, Inc.)
Original Assignee
ITT Corporation (ITT, Inc.)
Inventors
Bahler, Lawrence G.
Primary Examiner(s)
Knepper, David D.

Application Number

US08/044,546
Time in Patent Office

251 Days
Field of Search

395/2, 381/41-43
US Class Current

704/200
CPC Class Codes

G10L 17/06 Decision making techniques;...

G10L 21/028 using properties of sound s...

Automated sorting of voice messages through speaker spotting

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Automated sorting of voice messages through speaker spotting

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links