Computer voice recognition method verifying speaker identity using speaker and non-speaker data
First Claim
1. A method for verifying a person on the basis of voice signals using a neural network, the method comprising the steps of:
- in a training phase, (A) generating and storing a reference feature vector from training phase voice signal generated by a speaker to be verified, (B) generating and storing an anti-reference feature vector from a voice anti-signal generated by a speaker not to be verified, (C) training the neural network using the reference feature vector and the anti-reference feature vector, thereby adapting weightings of the neural network to permit an optimum classification for a two-class problem;
and in an operating phase, (D) generating a feature vector from an operating phase voice signal generated by an unknown person, who may or may not be the speaker to be verified, (E) submitting the feature vector, the stored reference feature vector and the stored anti-feature vector to the neural network for a only a single comparison between (a) the feature vector and the reference feature vector and only a single comparison between (b) the feature vector and the anti-reference feature vector, wherein said comparisons are capable of being made when said anti-reference feature vector is generated from only one speaker not to be verified, (F) generating only a single operating phase similarity value from only the two comparisons in step (E), and (G) classifying the unknown person as verified when the single operating phase similarity value falls within a predetermined range of values.
7 Assignments
0 Petitions
Accused Products
Abstract
A method for recognizing a speaker in which a voice signal is spoken into a computer by a speaker and a feature vector is formed for the voice signal. The feature vector is compared to at least one stored reference feature vector and to at least one anti-feature vector. The reference feature vector is formed from a speech sample of a speaker to be verified. The anti-feature vector was formed from a speech sample that was spoken in by another speaker who is not the speaker to be verified. A 2-class classification is resolved by forming a similarity value and evaluating the similarity value on the basis of a predetermined range within which the similarity value must deviate from a predetermined value so that the voice signal can be classified as deriving from the speaker to be verified.
82 Citations
13 Claims
-
1. A method for verifying a person on the basis of voice signals using a neural network, the method comprising the steps of:
-
in a training phase, (A) generating and storing a reference feature vector from training phase voice signal generated by a speaker to be verified, (B) generating and storing an anti-reference feature vector from a voice anti-signal generated by a speaker not to be verified, (C) training the neural network using the reference feature vector and the anti-reference feature vector, thereby adapting weightings of the neural network to permit an optimum classification for a two-class problem;
and in an operating phase, (D) generating a feature vector from an operating phase voice signal generated by an unknown person, who may or may not be the speaker to be verified, (E) submitting the feature vector, the stored reference feature vector and the stored anti-feature vector to the neural network for a only a single comparison between (a) the feature vector and the reference feature vector and only a single comparison between (b) the feature vector and the anti-reference feature vector, wherein said comparisons are capable of being made when said anti-reference feature vector is generated from only one speaker not to be verified, (F) generating only a single operating phase similarity value from only the two comparisons in step (E), and (G) classifying the unknown person as verified when the single operating phase similarity value falls within a predetermined range of values. - View Dependent Claims (2, 3, 5, 6, 7)
the step of generating and storing a reference feature vector is a step of generating and storing a single reference feature vector; and
the step of generating and storing a reference feature vector is a step of generating and storing a single anti-reference feature vector.
-
-
5. The method according to claim 4, wherein:
-
following each iteration of method steps A) through C) in claim 2, a similarity value is formed from respective comparisons between the feature vector and the reference feature vector, and between the feature vector and the anti-feature vector, a similarity of the feature vector to the reference feature vector and a similarity of the feature vector with the anti-feature vector being described by said similarity value;
a new iteration is undertaken when the similarity value deviates by more than a prescribed range from a prescribed value; and
the speaker is otherwise not classified as the speaker to be verified.
-
-
6. The method as in one of claim 1,4-5 wherein:
-
at least two reference feature vectors or at least two anti-feature vectors are employed in the method; and
the reference feature vectors or anti-feature vectors are formed by time distortion of a voice signal spoken by the speaker to be verified or, respectively, of a voice signal spoken by the speaker not to be verified.
-
-
7. The method as in one of claim 1,4-5 wherein individual, spoken letters or individual, spoken numbers are utilized as voice signals for the verification.
-
4. A method for verifying a person on the basis of voice signals using a neural network, the method comprising the steps of:
-
in a training phase, (A) generating and storing a reference feature vector from training phase voice signal generated by a speaker to be verified, (B) generating and storing an anti-reference feature vector from a voice anti-signal generated by a speaker not to be verified, (C) training the neural network using the reference feature vector and the anti-reference feature vector, thereby adapting weightings of the neural network to permit an optimum classification for a two-class problem;
and in an operating phase, (D) generating a feature vector from an operating phase voice signal generated by an unknown person, who may or may not be the speaker to be verified, (E) submitting the feature vector, the stored reference feature vector and the stored anti-feature vector to the neural network for a only a single comparison between (a) the feature vector and the reference feature vector and only a single comparison between (b) the feature vector and the anti-reference feature vector, wherein said comparisons are capable of being made when said anti-reference feature vector is generated from only one speaker not to be verified, (F) generating only a single operating phase similarity value from only the two comparisons in step (E), (G) repeating steps D-F for a plurality of operating phase voice signals generated by the unknown person; and
(H) classifying the unknown person as verified when the result of a function combining the single operating phase similarity values falls within a predetermined range of values.
-
-
8. A telecommunications system in which the following method is undertaken for speaker verification when a voice signal is received from the telecommunications system:
-
in a training phase, (A) generating and storing a reference feature vector from training phase voice signal generated by a speaker to be verified, (B) generating and storing an anti-reference feature vector from a voice anti-signal generated by a speaker not to be verified, (C) training the neural network using the reference feature vector and the anti-reference feature vector, thereby adapting weightings of the neural network to permit an optimum classification for a two-class problem;
and in an operating phase, (D) generating a feature vector from an operating phase voice signal generated by an unknown person, who may or may not be the speaker to be verified, (E) submitting the feature vector, the stored reference feature vector and the stored anti-feature vector to the neural network for a only a single comparison between (a) the feature vector and the reference feature vector and only a single comparison between (b) the feature vector and the anti-reference feature vector, wherein said comparisons are capable of being made when said anti-reference feature vector is generated from only one speaker not to be verified, (F) generating only a single operating phase similarity value from only the two comparisons in step (E), and (G) classifying the unknown person as verified when the single operating phase similarity value falls within a predetermined range of values.
-
-
9. A telecommunications system in which the following method is undertaken for speaker verification when a voice signal is received from the telecommunications system:
-
in a training phase, (A) generating and storing a reference feature vector from training phase voice signal generated by a speaker to be verified, (B) generating and storing an anti-reference feature vector from a voice anti-signal generated by a speaker not to be verified, (C) training the neural network using the reference feature vector and the anti-reference feature vector, thereby adapting weightings of the neural network to permit an optimum classification for a two-class problem;
and in an operating phase, (D) generating a feature vector from an operating phase voice signal generated by an unknown person, who may or may not be the speaker to be verified, (E) submitting the feature vector, the stored reference feature vector and the stored anti-feature vector to the neural network for a only a single comparison between (a) the feature vector and the reference feature vector and only a single comparison between (b) the feature vector and the anti-reference feature vector, wherein said comparisons are capable of being made when said anti-reference feature vector is generated from only one speaker not to be verified, (F) generating only a single operating phase similarity value from only the two comparisons in step (E), (G) repeating steps D-F for a plurality of operating phase voice signals generated by the unknown person; and
(H) classifying the unknown person as verified when the result of a function combining the single operating phase similarity values falls within a predetermined range of values. - View Dependent Claims (10)
-
-
11. A mobile radiotelephone system in which the following method is undertaken for speaker verification when a voice signal is received from the telecommunications system:
-
in a training phase, (A) generating and storing a reference feature vector from training phase voice signal generated by a speaker to be verified, (B) generating and storing an anti-reference feature vector from a voice anti-signal generated by a speaker not to be verified, (C) training the neural network using the reference feature vector and the anti-reference feature vector, thereby adapting weightings of the neural network to permit an optimum classification for a two-class problem;
and in an operating phase, (D) generating a feature vector from an operating phase voice signal generated by an unknown person, who may or may not be the speaker to be verified, (E) submitting the feature vector, the stored reference feature vector and the stored anti-feature vector to the neural network for a only a single comparison between (a) the feature vector and the reference feature vector and only a single comparison between (b) the feature vector and the anti-reference feature vector, wherein said comparisons are capable of being made when said anti-reference feature vector is generated from only one speaker not to be verified, (F) generating only a single operating phase similarity value from only the two comparisons in step (E), and (G) classifying the unknown person as verified when the single operating phase similarity value falls within a predetermined range of values.
-
-
12. A mobile radiotelephone system in which the following method is undertaken for speaker verification when a voice signal is received from the telecommunications system:
-
in a training phase, (A) generating and storing a reference feature vector from training phase voice signal generated by a speaker to be verified, (B) generating and storing an anti-reference feature vector from a voice anti-signal generated by a speaker not to be verified, (C) training the neural network using the reference feature vector and the anti-reference feature vector, thereby adapting weightings of the neural network to permit an optimum classification for a two-class problem;
and in an operating phase, (D) generating a feature vector from an operating phase voice signal generated by an unknown person, who may or may not be the speaker to be verified, (E) submitting the feature vector, the stored reference feature vector and the stored anti-feature vector to the neural network for a only a single comparison between (a) the feature vector and the reference feature vector and only a single comparison between (b) the feature vector and the anti-reference feature vector, wherein said comparisons are capable of being made when said anti-reference feature vector is generated from only one speaker not to be verified, (F) generating only a single operating phase similarity value from only the two comparisons in step (E), (G) repeating steps D-F for a plurality of operating phase voice signals generated by the unknown person; and
(H) classifying the unknown person as verified when the result of a function combining the single operating phase similarity values falls within a predetermined range of values. - View Dependent Claims (13)
following each iteration of method steps (A) through (C) in claim 12, a similarity value is formed from respective comparisons between the feature vector and the reference feature vector, and between the feature vector and the anti-feature vector, a similarity of the feature vector to the reference feature vector and a similarity of the feature vector with the anti-feature vector being described by said similarity value;
a new iteration is undertaken when the similarity value deviates by more than a prescribed range from a prescribed value; and
the speaker is otherwise not classified as the speaker to be verified.
-
Specification