Automated sorting of voice messages through speaker spotting
First Claim
1. In a method of automatically recognizing a speaker on a communication channel, including the steps of digitizing input speech signals into a series of frames of digital data representing the input speech, analyzing the speech frames by a speaker recognition module which compares the incoming speech to a reference set of speech features of a given group of different speakers obtained during prior training sessions and generates respective match score therefrom, and determining which speaker the input speech is identified with based upon the match scores with each speaker associated with at least one stored reference frame, in combination therewith, the improvement wherein:
- said analysis of speech frames by said speaker recognition module is implemented through the use of a set of speech feature vectors to characterize a given speaker'"'"'s speech patterns, said speech feature vectors being non-parametric in nature andsaid comparison of incoming speech to reference speech features by said speaker recognition module includes generating a match score which is a sum of a ScoreA set equal to the average of the minimum Euclidean squared distance between the unknown speech frame and all reference frames of a given speaker over all frames of the unknown input, and ScoreB set equal to the average of the minimum Euclidean squared distance between each frame of the reference set to all frames of the unknown input, over all frames of the reference set of speech features,wherein the "distance" from uj to the reference message R is;
##EQU15## and the "distance" from ri to the unknown message U is;
##EQU16## wherein uj is the j-th frame of unknown message U and ri be the i-th frame of reference message R, and ##EQU17## and wherein said comparison of incoming speech to reference speech features includes a step of normalizing said match score with respect to said stored reference frame for all speakers to provide a normalized score and comparing all normalized scores for all speakers to select the speaker having a highest acceptable normalized score.
0 Assignments
0 Petitions
Accused Products
Abstract
A speaker recognition apparatus employs a non-parametric baseline algorithm for speaker recognition which characterizes a given speaker'"'"'s speech patterns by a set of speech feature vectors, and generates match scores which are sums of a ScoreA set equal to the average of the minimum Euclidean squared distance between the unknown speech frame and all reference frames of a given speaker over all frames of the unknown input, and ScoreB set equal to the average of the minimum Euclidean squared distance between each frame of the reference set to all frames of the unknown input. The performance on a queue of talkers is further improved by normalization of reference message match distances. The improved baseline algorithm addresses the co-channel problem of speaker spotting when plural speech signals are intermixed on the same channel by using a union of reference sets for pairs of speakers as the reference set for a co-channel signal, and/or by conversational state modelling.
-
Citations
8 Claims
-
1. In a method of automatically recognizing a speaker on a communication channel, including the steps of digitizing input speech signals into a series of frames of digital data representing the input speech, analyzing the speech frames by a speaker recognition module which compares the incoming speech to a reference set of speech features of a given group of different speakers obtained during prior training sessions and generates respective match score therefrom, and determining which speaker the input speech is identified with based upon the match scores with each speaker associated with at least one stored reference frame, in combination therewith, the improvement wherein:
-
said analysis of speech frames by said speaker recognition module is implemented through the use of a set of speech feature vectors to characterize a given speaker'"'"'s speech patterns, said speech feature vectors being non-parametric in nature and said comparison of incoming speech to reference speech features by said speaker recognition module includes generating a match score which is a sum of a ScoreA set equal to the average of the minimum Euclidean squared distance between the unknown speech frame and all reference frames of a given speaker over all frames of the unknown input, and ScoreB set equal to the average of the minimum Euclidean squared distance between each frame of the reference set to all frames of the unknown input, over all frames of the reference set of speech features, wherein the "distance" from uj to the reference message R is;
##EQU15## and the "distance" from ri to the unknown message U is;
##EQU16## wherein uj is the j-th frame of unknown message U and ri be the i-th frame of reference message R, and ##EQU17## and wherein said comparison of incoming speech to reference speech features includes a step of normalizing said match score with respect to said stored reference frame for all speakers to provide a normalized score and comparing all normalized scores for all speakers to select the speaker having a highest acceptable normalized score. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. In a method of automatically sorting voice messages transmitted on a communication channel, including the steps of placing input speech messages in a queue, digitizing the input speech messages into a series of frames of digital data representing the input speech messages, analyzing the speech frames by a speaker recognition module which compares each incoming speech message to a reference set of speech features of a given group of different speakers obtained during prior training sessions and generates respective match score therefrom with each speaker associated with at least one stored reference frame, and determining which speaker the input speech message is identified with based upon the match scores, in combination therewith, the improvement wherein:
-
said analysis of speech frames by said speaker recognition module is implemented through the use of a set of speech feature vectors to characterize a given speaker'"'"'s speech patterns, said speech feature vectors being non-parametric in nature and said comparison of incoming speech to reference speech features by said speaker recognition module includes generating a match score which is a sum of a ScoreA set equal to the average of the minimum Euclidean squared distance between the unknown speech frame and all reference frames of a given speaker over all frames of the unknown input, and ScoreB set equal to the average of the minimum Euclidean squared distance between each frame of the reference set to all frames of the unknown input, over all frames of the reference set of speech features, wherein the "distance" from uj to the reference message R is;
##EQU18## and the "distance" from ri to the unknown message U is;
##EQU19## wherein uj is the j-th frame of unknown message U and ri be the i-th frame of reference message R, andand wherein said comparison of incoming speech to reference speech features includes a step of normalizing said match score with respect to said stored reference frame for all speakers to provide a normalized score and comparing all normalized scores for all speakers to select the speaker having a highest acceptable normalized score.
-
Specification