Apparatus and methods for speaker verification/identification/classification employing non-acoustic and/or acoustic models and databases
First Claim
1. A method of controlling access of a speaker to one of a service and a facility, the method comprising the steps of:
- (a) receiving first spoken utterances of the speaker, the first spoken utterances containing indicia of the speaker;
(b) decoding the first spoken utterances;
(c) accessing a database corresponding to the decoded first spoken utterances, the database containing information attributable to a speaker candidate having indicia substantially similar to the speaker;
(d) querying the speaker with at least one question based on the information contained in the accessed database;
(e) receiving second spoken utterances of the speaker, the second spoken utterances being representative of at least one answer to the at least one question;
(f) decoding the second spoken utterances;
(g) verifying the accuracy of the decoded answer against the information contained in the accessed database serving as the basis for the question;
(h) taking a voice sample from the utterances of the speaker and processing the voice sample against an acoustic model attributable to the speaker candidate without requiring dependency on the decoded first and second spoken utterances;
(i) generating a score corresponding to the accuracy of the decoded answer and the closeness of the match between the voice sample and the model; and
(j) comparing the score to a predetermined threshold value and if the score is one of substantially equivalent to and above the threshold value, then permitting speaker access to one of the service and the facility.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for securing access to a service or facility employing automatic speech recognition, text-independent speaker identification, natural language understanding techniques and additional dynamic and static features. The method includes the steps of receiving and decoding speech containing indicia of the speaker such as a name, address or customer number; accessing a database containing information on candidate speakers; questioning the speaker based on the information; receiving, decoding and verifying an answer to the question; obtaining a voice sample of the speaker and verifying the voice sample against a model; generating a score based on the answer and the voice sample; and granting access if the score is equal to or greater than a threshold. Alternatively, the method includes the steps of receiving and decoding speech containing indicia of the speaker; generating a sub-list of speaker candidates having indicia substantially matching the speaker; activating databases containing information about the speaker candidates in the sub-list; performing voice classification analysis; eliminating speaker candidates based on the voice classification analysis; questioning the speaker regarding the information; eliminating speaker candidates based on the answer; and iteratively repeating prior steps until one speaker candidate (in which case the speaker is granted access), or no speaker candidate remains (in which case the speaker is not granted access).
505 Citations
35 Claims
-
1. A method of controlling access of a speaker to one of a service and a facility, the method comprising the steps of:
-
(a) receiving first spoken utterances of the speaker, the first spoken utterances containing indicia of the speaker; (b) decoding the first spoken utterances; (c) accessing a database corresponding to the decoded first spoken utterances, the database containing information attributable to a speaker candidate having indicia substantially similar to the speaker; (d) querying the speaker with at least one question based on the information contained in the accessed database; (e) receiving second spoken utterances of the speaker, the second spoken utterances being representative of at least one answer to the at least one question; (f) decoding the second spoken utterances; (g) verifying the accuracy of the decoded answer against the information contained in the accessed database serving as the basis for the question; (h) taking a voice sample from the utterances of the speaker and processing the voice sample against an acoustic model attributable to the speaker candidate without requiring dependency on the decoded first and second spoken utterances; (i) generating a score corresponding to the accuracy of the decoded answer and the closeness of the match between the voice sample and the model; and (j) comparing the score to a predetermined threshold value and if the score is one of substantially equivalent to and above the threshold value, then permitting speaker access to one of the service and the facility. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A method of controlling access of a speaker to one of a service and a facility from among a multiplicity of speaker candidates, the method comprising the steps of:
-
(a) receiving first spoken utterances of the speaker, the first spoken utterances containing indicia of the speaker; (b) decoding the first spoken utterances; (c) generating a sub-list of speaker candidates that substantially match the speakers decoded spoken utterances; (d) activating databases respectively corresponding to the speaker candidates in the sub-list, the databases containing information respectively attributable to the speaker candidates; (e) performing a voice classification analysis on voice characteristics of the speaker without requiring dependency on the decoded first spoken utterance; (f) eliminating speaker candidates who do not substantially match these characteristics; (g) querying the speaker with at least one question that is relevant to the information in the databases of speaker candidates remaining after the step (f); (h) further eliminating speaker candidates based on the accuracy of the answer provided by the speaker in response to the at least one question; (i) further performing the voice classification analysis on the voice characteristics from the answer provided by the speaker without requiring dependency on the decoded first spoken utterance; (j) still further eliminating speaker candidates who do not substantially match these characteristics; and (k) iteratively repeating steps (g) through (j) until one of one speaker candidate and no speaker candidates remain, if one speaker candidate remains then permitting the speaker access and if no speaker candidate remains then denying the speaker access. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
-
-
34. Apparatus for controlling access of a speaker to one of a service and a facility, the apparatus comprising:
-
means for receiving first spoken utterances of the speaker, the first spoken utterances containing indicia of the speaker; means for decoding the first spoken utterances; means for accessing a database corresponding to the decoded first spoken utterances, the database containing information attributable to a speaker candidate having indicia substantially similar to the speaker; means for querying the speaker with at least one question based on the information contained in the accessed database; means for receiving second spoken utterances of the speaker, the second spoken utterances being representative of at least one answer to the at least one question; means for decoding the second spoken utterances; means for verifying the accuracy of the decoded answer against the information contained in the accessed database serving as the basis for the question; means for taking a voice sample from the utterances of the speaker and processing the voice sample against an acoustic model attributable to the speaker candidate without requiring dependency on the decoded first and second spoken utterances; means for generating a score corresponding to the accuracy of the decoded answer and the closeness of the match between the voice sample and the model; and means for comparing the score to a predetermined threshold value and if the score is one of substantially equivalent to and above the threshold value, then permitting speaker access to one of the service and the facility.
-
-
35. Apparatus for controlling access of a speaker to one of a service and a facility from among a multiplicity of speaker candidates, the apparatus comprising:
-
means for receiving first spoken utterances of the speaker, the first spoken utterances containing indicia of the speaker; means for decoding the first spoken utterances; means for generating a sub-list of speaker candidates that substantially match the speakers decoded spoken utterances; means for activating databases respectively corresponding to the speaker candidates in the sub-list, the databases containing information respectively attributable to the speaker candidates; means for performing a voice classification analysis on voice characteristics of the speaker without requiring dependency on the decoded first spoken utterance; means for eliminating speaker candidates who do not substantially match these characteristics; means for querying the speaker with at least one question to the speaker that is relevant to the information in the databases of speaker candidates remaining after elimination by the eliminating means; means for further eliminating speaker candidates based on the accuracy of the answer provided by the speaker in response to the at least one question; means for further performing the voice classification analysis on the voice characteristics from the answer provided by the speaker without requiring dependency on the decoded first spoken utterance; means for still further eliminating speaker candidates who do not substantially match these characteristics; and means for iteratively repeating the querying and voice classification analysis procedures until one of one speaker candidate and no speaker candidate remains, if one speaker candidate remains then permitting the speaker access and if no speaker candidate remains then denying the speaker access.
-
Specification