SPEAKER RECOGNITION IN MULTIMEDIA SYSTEM
First Claim
1. A method for identifying a user among a plurality of users of a multimedia system including one or more devices for providing multimedia content from one or more sources of digital information, in order to provide individually adjusted access and control of multimedia content from the multimedia system, the method comprising the steps of:
- providing a collection of i-vector sets, each i-vector set including i-vectors based on one or more words spoken by a user of the multimedia system and being associated with an access profile of this user,acquiring a speech utterance from a current user, and extracting an i-vector for the speech utterance using total variability modeling,comparing the extracted i-vector with each i-vector set in the collection, in order to identify a target set most similar to the extracted i-vector,granting, to the current user, access to the multimedia system in accordance with the access profile associated with the identified target set,wherein the speech utterance is acquired using one of a plurality of sources, and wherein the method further comprises minimizing source variation in the total variability modeling by;
for each data source, estimating a source-specific informative prior, which is defined by a mean and a covariance, andfor each speech utterance acquired using a specific data source, re-centering first-order statistics of the speech utterance around the mean of the informative prior associated with the source, and using the co-variance of the informative prior associated with the source when extracting the i-vector for the speech utterance.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for identifying a user among a plurality of users of a multimedia system comprising extracting an i-vector for the speech utterance using total variability modeling, comparing the extracted i-vector with a collection of i-vector sets in order to identify a target set most similar to the extracted i-vector, and granting access to the multimedia system in accordance with an access profile associated with the identified target set. Further, source variation is minimized by, for each speech utterance acquired using a specific data source, re-centering first-order statistics of the speech utterance around the mean of an informative prior associated with the source, and using the co-variance of the informative prior associated with the source when extracting the i-vector for the speech utterance.
24 Citations
15 Claims
-
1. A method for identifying a user among a plurality of users of a multimedia system including one or more devices for providing multimedia content from one or more sources of digital information, in order to provide individually adjusted access and control of multimedia content from the multimedia system, the method comprising the steps of:
-
providing a collection of i-vector sets, each i-vector set including i-vectors based on one or more words spoken by a user of the multimedia system and being associated with an access profile of this user, acquiring a speech utterance from a current user, and extracting an i-vector for the speech utterance using total variability modeling, comparing the extracted i-vector with each i-vector set in the collection, in order to identify a target set most similar to the extracted i-vector, granting, to the current user, access to the multimedia system in accordance with the access profile associated with the identified target set, wherein the speech utterance is acquired using one of a plurality of sources, and wherein the method further comprises minimizing source variation in the total variability modeling by; for each data source, estimating a source-specific informative prior, which is defined by a mean and a covariance, and for each speech utterance acquired using a specific data source, re-centering first-order statistics of the speech utterance around the mean of the informative prior associated with the source, and using the co-variance of the informative prior associated with the source when extracting the i-vector for the speech utterance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15)
-
-
14. A multimedia system comprising:
-
one or more sources of digital information, one or more devices for providing multimedia content from the sources, a database storing a collection of i-vector sets, each i-vector set including i-vectors based on one or more words spoken by a user of the multimedia system and being associated with an access profile of this user, a plurality of speech recording data sources, processing circuitry configured to; extract an i-vector for a speech utterance acquired from one of said data sources using total variability modeling, while minimizing source variation by; for each data source, estimating a source-specific informative prior, which is defined by a mean and a covariance, and for each speech utterance acquired using a specific data source, re-centering first-order statistics of the speech utterance around the mean of the informative prior associated with the source, and using the co-variance of the informative prior associated with the source when extracting the i-vector for the speech utterance compare the extracted i-vector with each i-vector set in the collection, in order to identify a target set most similar to the extracted i-vector, and grant, to the current user, access to the multimedia system in accordance with the access profile associated with the identified target set.
-
Specification