Speaker verifier using nearest-neighbor distance measure

US 5,339,385 A
Filed: 07/22/1992
Issued: 08/16/1994
Est. Priority Date: 07/22/1992
Status: Expired due to Fees

First Claim

Patent Images

1. In a Speaker Verification System comprising a means for processing spoken test into frames of speech, a means for enrolling a speaker into the system, a means for eliciting a spoken test phrase from a speaker claiming to be a specified enrolled speaker, a means for determining one or more verification distances between said spoken test phrase and corresponding "words" entered into the system during said enrollment into the system of said specified enrolled speaker, and a means for determining a verification score from such verification distance data and for determining therefrom whether said claiming speaker is said specified enrolled speaker, the improvement wherein:

said processing means includes a means for converting said spoken text into non-parametric speech vectors, whereby at least one of said speech vectors is included in each of said frames of speech; and

said determination of said verification distance includes a determination of nearest-neighbor Euclidean distances between single frames of speech associated with said spoken test phrase and corresponding frames of speech associated with said "words" entered into the system during said enrollment into the system of said specified enrolled speaker and between single frames of speech associated with said enrollment "words" of said specified enrolled speaker and corresponding frames of speech associated with said spoken test phrase.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speaker verification system which accepts or rejects the claimed identity of an individual based on analysis and measurements of the speaker'"'"'s utterances. The utterances are elicited by prompting the individual seeking identification to read test phrases chosen at random by the verification system composed of words from a small vocabulary. Nearest-neighbor distances between speech frames derived from such spoken test phrases and speech frames of corresponding vocabulary "words" from previously stored utterances of the speaker seeking identification are computed along with distances between such spoken test phrases and corresponding vocabulary words for a set of reference speakers. The claim for identification is accepted or rejected based on the relationship among such distances and a predetermined threshold value.

Citations

33 Claims

1. In a Speaker Verification System comprising a means for processing spoken test into frames of speech, a means for enrolling a speaker into the system, a means for eliciting a spoken test phrase from a speaker claiming to be a specified enrolled speaker, a means for determining one or more verification distances between said spoken test phrase and corresponding "words" entered into the system during said enrollment into the system of said specified enrolled speaker, and a means for determining a verification score from such verification distance data and for determining therefrom whether said claiming speaker is said specified enrolled speaker, the improvement wherein:
- said processing means includes a means for converting said spoken text into non-parametric speech vectors, whereby at least one of said speech vectors is included in each of said frames of speech; and
  
  said determination of said verification distance includes a determination of nearest-neighbor Euclidean distances between single frames of speech associated with said spoken test phrase and corresponding frames of speech associated with said "words" entered into the system during said enrollment into the system of said specified enrolled speaker and between single frames of speech associated with said enrollment "words" of said specified enrolled speaker and corresponding frames of speech associated with said spoken test phrase.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The Speaker Verification System of claim 1 wherein said means for the determination of said verification distance additionally includes means for providing weighting of distances so determined by a bias value computed to discount the effects of diversity among repeated occurrences of said "words" entered into the system during said enrollment into the system by said specified enrolled speaker, and wherein said determination of nearest-neighbor Euclidean distances is restricted to a range defined by a score limit value.
  - 3. The Speaker Verification System of claim 2 wherein said means for determining a verification score includes a comparison between said verification distance and a predetermined threshold value and wherein said determination of whether said claiming speaker is said specified enrolled speaker is derived from said comparison.
  - 4. The Speaker Verfication System of claim 2 wherein said means for determining a verification score additionally includes a means for determining a reference distance between said spoken test phrase and corresponding "words" entered into the system by a set of reference speakers and computed in like manner to that of said verification distance, and said determination of whether said claiming speaker is said specified enrolled speaker is derived from a relationship between said verification distance, said reference distance, and said predetermined threshold value.
  - 5. The Speaker Verfication System of claim 1 wherein said means for determining a verification score includes a comparison between said verification distance and a predetermined threshold value and wherein said determination of whether said claiming speaker is said specified enrolled speaker is derived from said comparison.
  - 6. The Speaker Verfication System of claim 2 wherein said means for determining a verification score additionally includes a means for determining a reference distance between said spoken test phrase and corresponding "words" entered into the system by a set of reference speakers and computed in like manner to that of said verification distance, and said determination of whether said claiming speaker is said specified enrolled speaker is derived from a relationship between said verification distance, said reference distance, and said predetermined threshold value.

7. The Speaker Verfication System comprising:
- means for processing spoken test entered into the system whereby said spoken test is sampled, digitized and converted into frames of speech, each frame being comprised of multiple speech vector components, said speech vector components being non-parametric in nature;
  
  means for enrolling a speaker into the system whereby predetermined spoken text is entered into the system by said speaker and processed by said means for processing and thereafter stored by the system;
  
  means responsive to a request for identification for a speaker claiming to be a specified enrolled speaker for generating a prompt phrase comprising one or more "words" derived from said predetermined spoken test entered by said specified enrolled speaker and whereupon said prompt phrase is spoken by said claiming speaker and said spoken prompt phrase is entered into the system and processed by said means for processing;
  
  means for determining nearest-neighbor distances d_i,T, wherein said nearest-neighbor distances d_i,T are computed as the Euclidian distances between each frame of said processed spoken prompt phrase and speech frames from corresponding regions of each occurrence of the same "word" stored during said enrollment into the system of said specified enrolled speaker;
  
  means for determining nearest-neighbor distances d_j,E, wherein said nearest-neighbor distances d_j,E are computed as the Euclidian distances between each frame of each occurrence of each "word" comprising said prompt phrase and speech frames from corresponding regions of each occurrence of the same "word" in said processed spoken prompt phrase;
  
  means for determining a distance d_T,E between said processed spoken prompt phrase and corresponding "words" entered into the system during said enrollment into the system of said specified enrolled speaker, wherein said distance d_T,E is derived from an average of all said nearest-neighbor distances d_i,T and an average of all said nearest-neighbor distances d_j,E ; and
  
  means for determining a verification score related to said distances d_i,T, d_j,E and d_T,E and for determining therefrom whether said claiming speaker is said specified enrolled speaker.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
- - 8. The Speaker Verification System of claim 7 wherein said means determination of said nearest-neighbor distances d.sub.,T and of said nearest-neighbor distances d_j,E includes an additional step of weighting each of said distances by a bias value computed to discount the effects of diversity among repeated occurrences of said predetermined spoken test entered into the system by said specified enrolled speaker, and wherein said computation of Euclidean distances is restricted to range defined by a score limit value.
  - 9. The Speaker Verification System of claim 8 wherein said means for determining a verification score includes a comparison between said distance d_T,E and a predetermined threshold value and wherein said determination of whether said claiming speaker is said specified enrolled speaker is derived from said comparison.
  - 10. The Speaker Verification System of claim 9 wherein said means for determining a verification score additionally includes a means for determining a distance d_T,R between said processed spoken prompt phrase and corresponding "words" entered into the system by a set of reference speakers and computed in like manner to that of said distance d_T,E, and said determination of whether said claiming speaker is said specified enrolled speaker is derived from a relationship between said distance d_T,R, said distance d_T,E, and said predetermined threshold value.
  - 11. The Speaker Verification System of claim 7 wherein said means for determining a verification score includes a comparison between said distance d_T,E and a predetermined threshold value and wherein said determination of whether said claiming speaker is said specified enrolled speaker is derived from said comparison.
  - 12. The Speaker Verification System of claim 11 wherein said means for determining a verification score additionally includes a means for determining a distance d_T,R between said processed spoken prompt phrase and corresponding "words" entered into the system by a set of reference speakers and computed in like manner to that of said distance d_T,E, and said determination of whether said claiming speaker is said specified enrolled speaker is derived from a relationship between said distance d_T,R, said distance d_T,E, and said predetermined threshold value.
  - 13. The Speaker Verification System of claim 7 wherein said means for analyzing additionally determines whether the "words" comprising said spoken prompt phrase correspond to the words in said prompt phrase and rejects said claiming speaker in the event such correspondency does not exist.
  - 14. The Speaker Verification System of claim 7 further including:
    - means for analyzing said processed spoken prompt phrase to detect the end points of "words" comprising said spoken prompt phrase, and wherein said corresponding regions of each occurrence of the same "word" as used by said means for determining nearest-neighbor distances d_i,t or by said means for determining nearest-neighbor distances d_j,E are determined in relation to said detected end points.

15. A speaker verification system comprising:
- means for entering spoken test into the system;
  
  means for sampling and digitizing said spoken test;
  
  means for converting said digitized samples into frames of speech, each frame being comprised of multiple speech vector components, said speech vector components being non-parametric in nature;
  
  means for enrolling one or more speakers into the system during an enrollment session whereby predetermined spoken test is entered into the system by each such speaker and processed by said means for sampling and means for converting and thereafter stored by the system;
  
  means for identifying stored enrollment data for a particular enrolled speaker based on a claim for verification as said particular enrolled speaker;
  
  mean for identifying one or more "words" derived from the spoken test entered by said particular enrolled speaker during said enrollment session and means for presentation of said "words" as a prompt to be spoken by a speaker during a verification session, said prompted spoken "words" being thereupon entered into the system via said means for entering and processed by said means for sampling and means for converting;
  
  means for storing said prompted spoken "words";
  
  means for comparing each speech frame from said verification session with speech frames from corresponding regions of each occurrence of the same "word" stored during said particular speaker'"'"'s enrollment session, and computing nearest-neighbor distances d_i,T between al such pairs of verification and enrollment frames;
  
  means for comparing each speech frame from each occurrence of "words" comprising said prompt stored during said particular speaker'"'"'s enrollment session with speech frames from corresponding regions of said prompted spoken "words", and computing nearest-neighbor distances d_j,E between all such pairs of enrollment and verification frames;
  
  means for computing a distance d_T,E from an average of all said nearest-neighbor distances d_i,T and an average of all said nearest-neighbor distances d_j,E ;
  
  means for comparing said distance d_T,E with a predetermined value and causing an output signal to occur based on the difference between said distance d_T,E and said predetermined value, said output signal being indicative of acceptance or rejection for a speaker claiming to be said particular enrolled speaker.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The Speaker Verification System of claim 15 wherein said means for computing said nearest-neighbor distances d_i,T and of said nearest-neighbor distances d_j,E includes an additional means for weighing each of said distances by a bias value, and wherein said computation of said distances is restricted to a range defined by a score limit value.
  - 17. The Speaker Verification System of claim 15 wherein said means for comparing said distance d_T,E with a predetermined value additionally includes a means for determining a distance d_T,R between said prompted spoken "words" and corresponding "words" entered into the system by a set of reference speakers and computed in like manner to that of said distance d_T,E, and said output signal is caused to occur based on a relationship between said distance d_T,R, said distance d_T,E, and said predetermined threshold value.
  - 18. The Speaker Verification System of claim 15 wherein the means for analyzing said prompted spoken "words" to detect the end points thereof additionally determines whether the "words" comprising said spoken prompt phrase correspond to the words in said prompt phrase and rejects said claiming speaker in the event such correspondency does not exist.
  - 19. The Speaker Verification System of claim 15 further including:
    - means for analyzing said prompted spoken "words" to detect the end points thereof, and wherein said corresponding regions of each occurrence of the same "word" as used by said means for comparing verification speech frames, and thereby computing nearest-neighbor distances d_i,T or by said means for comparing enrollment "words" speech frames, and thereby computing nearest-neighbor distances d_j,E are determined in relation to said detected end points.

20. In a method of automatically verifying a speaker as matching a claimed identify, including the steps of processing spoken input speech signals into a series of frames of digital data representing said input speech, analyzing the speech frames by a speaker verification module which compares the incoming speech to a reference set of speech features and generates respective match scores therefrom, and determining whether the input speech corresponds with the identified speaker based upon the match scores, the improvement wherein:
- said step of processing spoken input speech signals includes a substep of converting said spoken input speech into non-parametric speech vectors, whereby at least one of said speech vectors is included in each of said frames of data representing said input speech; and
  
  said comparison of incoming speech to reference speech features by said speaker recognition module includes generating a match score which is a sum of a first score set equal to the average of the minimum Euclidian squared distances between an input speech frame for a given region of a particular "word" and speech frames from said reference set of speech features corresponding to the same region of the same "word" over all frames of all "words" of said input speech, and a second score set equal to the average of the minimum Euclidian squared distances between a speech frame for a given region of a particular "word" from said reference set of speech features and an input speech frame corresponding to the same region of the same "word" over all frames of all "words" comprising said reference set of speech features.
- View Dependent Claims (21, 22, 23, 24, 25, 26)
- - 21. A method of speaker verification according to claim 20, wherein said comparison of incoming speech to reference speech features by said speaker recognition module includes an additional step of weighting each of said Euclidean distances by a bias value computed to discount the effects of diversity among repeated occurrences of said predetermined spoken text entered into the system by said specified enrolled speaker, and wherein said Euclidean distances are computed over a range restricted by a score limit value.
  - 22. A method of speaker verification according to claim 21, wherein said comparison of incoming speech to reference speech features by said speaker recognition module includes a further step in the generation of a match score of establishing a nearest-neighbor distance between input speech frames and enrollment speech frames for a set of reference speakers.
  - 23. A method of speaker verification according to claim 21, further including a threshold verification wherein the substep of identifying the end points o the input speech "words" additionally recognizes whether the "words" of the verification phrase were spoken as prompted and rejects the verification request upon a failure of such threshold verification test.
  - 24. A method of speaker verification according to claim 20, wherein said comparison of incoming speech to reference speech features by said speaker recognition module includes a further step in the generation of a match score of establishing a nearest-neighbor distance between input speech frames and enrollment speech frames or a set of reference speakers.
  - 25. A method of speaker verification according to claim 20, further including a threshold verification wherein the substep of identifying the end points of the input speech "words" additionally recognizes whether the "words" of the verification phrase were spoken as prompted and rejects the verification request upon a failure of such threshold verification test.
  - 26. The method of speaker verification according to claim 20, including the additional step of identifying the end points of "words" comprising said input speech, and wherein said corresponding regions of occurrences of the same "word" as used in determining said first score set of said second set are determined in relation to said identified end points.

27. In a method of automatically verifying a speaker as matching a claimed identity, including the steps of establishing the claimed identity, generation of a verification phrase comprising one or more "words" to be spoken by the speaker, processing the spoken input speech signals into a series of frames of digital data representing the input speech, analyzing the speech frames by a speaker verification module which compares the input speech to a reference set of speech features of the identified speaker obtained during prior enrollment sessions and generates respective match scores therefrom, and determining whether the input speech is identified with the identified speaker based upon the match scores, the improvement wherein:
- said step of processing spoken input speech signals includes a substep of coverting said spoken input speech into non-parametric speech vectors, whereby a least one of said speech vectors is included in each of said frames of data representing the input speech; and
  
  said comparison of incoming speech to reference speech features by said speaker recognition module includes generating a match score which is a sum of a first score set equal to the average of the minimum Euclidian squared distances between an input speech frame for a given region of a particular "word" and enrollment speech frames corresponding to the same region of the same "word", over all frames of all "words" of the input speech, and a second score set equal to the average of the minimum Euclidian squared distance between an enrollment speech frame for a given region of a particular "word" with an input speech frame corresponding to the same region of the same "word", over all frames of all "words" comprising the reference set of speech features,wherein the distance from t_j to the corresponding enrollment "word" E is;
  
  ##EQU10## and the distance from e_i to the corresponding test "word" T is;
  
  ##EQU11## wherein t_j is the j-th frame of the input "word" T and e_i is the i-th frame of enrollment "word" E, W_i and F_i are respectively the word and frame indexes for frame i, and W_j and F_j are respectively the word and frame indexes for frame j, andwherein said first score is equal to the average of d_J,E over all frames and said second score is equal to the average d_i,T over all frames.
- View Dependent Claims (28, 29, 30, 31, 32, 33)
- - 28. A method of speaker verification according to claim 27, wherein said comparison of incoming speech to reference speech features by said speaker recognition module includes an additional step of weighting each of said Euclidean distances by a bias value computed to discount the effects of diversity among repeated occurrences of said predetermined spoken text entered into the system by said specified enrolled speaker, and wherein said Euclidean distances are computed over a range restricted by a score limit value.
  - 29. A method of speaker verification according to claim 28, wherein said comparison of incoming speech to reference speech features by said speaker recognition module includes a further step in the generation of a match score of establishing a nearest-neighbor distance between input speech frames and enrollment speech frames for a set of reference speakers.
  - 30. A method of speaker verification according to claim 28, further including a threshold verification wherein the substep of identifying the end points of the input speech "words" additionally recognizes whether the "words" of the verification phrase were spoken as prompted and rejects the verification request upon a failure of such threshold verification test.
  - 31. A method of speaker verification according to claim 27, wherein said comparison of incoming speech to reference speech features by said speaker recognition module includes a further step in the generation of a match score of establishing a nearest-neighbor distance between input speech frames and enrollment speech frames for a set of reference speakers.
  - 32. A method of speaker verification according to claim 27, further including a threshold verification wherein the substep of identifying the end points of the input speech "words" additionally recognizes whether the "words" of the verification phrase were spoken as prompted and rejects the verification request upon a failure of such threshold verification test.
  - 33. The method of automatically verifying a speaker according to claim 27 including the additional step of identifying the end points of the input speech "words", and wherein said corresponding regions of occurrences of the same "word" as used in determining said first score set of said second score set are determined in relation to said identified end points.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
ITT Corporation (ITT, Inc.)
Original Assignee
ITT Corporation (ITT, Inc.)
Inventors
Higgins, Alan L.
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
Doerrler, Michelle

Application Number

US07/918,560
Time in Patent Office

755 Days
Field of Search

381/41-45, 395/2, 395/2.47, 395/2.48, 395/2.55, 395/2.59, 395/2.82
US Class Current

704/246
CPC Class Codes

G07C 9/37 using biometric data, e.g. ...

G10L 17/08 Use of distortion metrics o...

Speaker verifier using nearest-neighbor distance measure

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

33 Claims

Specification

Solutions

Use Cases

Quick Links

Speaker verifier using nearest-neighbor distance measure

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

33 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links