METHOD AND APPARATUS FOR LARGE POPULATION SPEAKER IDENTIFICATION IN TELEPHONE INTERACTIONS
First Claim
1. A method for determining whether a speaker uttering a tested utterance belongs to a predetermined set comprising an at least one known speaker, wherein an at least one training utterance is available for each of the at least one known speaker, the method comprising the steps of:
- extracting an at least one first feature of each of the at least one training utterance;
estimating an at least one model from the at least one first feature;
extracting an at least one second feature from an at least one frame of the tested utterance;
scoring the at least one second feature against an at least one of the at least one model, to obtain an at least one intermediate score;
determining an at least one model score using the at least one intermediate score;
selecting an at least one maximal score from the at least one model score; and
the speaker is determined to belong to the predetermined set, if the at least one maximal score exceeds a threshold.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for determining whether a speaker uttering an utterance belongs to a predetermined set comprising known speakers, wherein a training utterance is available for each known speaker. The method and apparatus test whether features extracted from the tested utterance provide a score exceeding a threshold when matched against one or more of models constructed upon voice samples of each known speaker. The method and system further provide optional enhancements such as determining, using, and updating model normalization parameters, a fast scoring algorithm, summed calls handling, or quality evaluation for the tested utterance.
295 Citations
38 Claims
-
1. A method for determining whether a speaker uttering a tested utterance belongs to a predetermined set comprising an at least one known speaker, wherein an at least one training utterance is available for each of the at least one known speaker, the method comprising the steps of:
-
extracting an at least one first feature of each of the at least one training utterance; estimating an at least one model from the at least one first feature; extracting an at least one second feature from an at least one frame of the tested utterance; scoring the at least one second feature against an at least one of the at least one model, to obtain an at least one intermediate score; determining an at least one model score using the at least one intermediate score; selecting an at least one maximal score from the at least one model score; and the speaker is determined to belong to the predetermined set, if the at least one maximal score exceeds a threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. An apparatus for determining whether a speaker uttering a tested utterance belongs to a predetermined set comprising an at least one known speaker, wherein an at least one training utterance is available for each of the at least one known speaker, the apparatus comprising:
-
a feature extraction component for extracting an at least one first feature of the tested utterance or of each of the at least one training utterance; a frame scoring component for scoring an at least one feature against an at least one model, to obtain an at least one intermediate score; a total model scoring component for determining an at least one model score using the at least one intermediate score; and a maximal score determination component for selecting a maximal score from the at least one model score. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
-
-
38. A computer readable storage medium containing a set of instructions for a general purpose computer, the set of instructions comprising:
-
extracting an at least one first feature of each of an at least one training utterance, each of the at least one training utterance uttered by a person belonging to a target set; estimating an at least one model from the at least one first feature, the model associated with the person belonging to the target set; extracting an at least one second feature from an at least one frame of a tested utterance; scoring the at least one second feature against the at least one model, to obtain an at least one intermediate score; determining a model score using the at least one intermediate score; selecting a maximal score from the at least one model score; and determining whether a speaker of the tested utterance belongs to the target set.
-
Specification