Speaker recognition in multimedia system
First Claim
1. A method for identifying a user among a plurality of users of a multimedia system including one or more devices for providing multimedia content from one or more sources of digital information, the method comprising the steps of:
- providing a collection of i-vector sets, each i-vector set including i-vectors based on one or more words spoken by a user of the multimedia system and being associated with an access profile of this user,acquiring a speech utterance from a current user, and extracting an i-vector for the speech utterance using total variability modeling,comparing the extracted i-vector with each i-vector set in the collection, in order to identify a target set most similar to the extracted i-vector,granting, to the current user, access to the multimedia system in accordance with the access profile associated with the identified target set, the access profile governing access to one or more components of the multimedia system,wherein the speech utterance is acquired using one of a plurality of sources, and wherein the method further comprises minimizing source variation in the total variability modeling by;
for each data source, estimating a source-specific informative prior, which is defined by a mean and a covariance, andfor each speech utterance acquired using a specific data source, re-centering first-order statistics of the speech utterance around the mean of the informative prior associated with the source, and using the co-variance of the informative prior associated with the source when extracting the i-vector for the speech utterance,wherein estimating a source-specific informative prior includes extracting a source specific set of i-vectors from data acquired from the data source, and using the source specific set of i-vectors to estimate the source-specific informative prior,wherein extracting a source specific set of i-vectors is done using an informative total variability matrix and a non-informative prior,wherein the informative total variability matrix is computed by performing a plurality of training iterations, e.g. expectation maximization training iterations, each iteration including computing a preliminary source-specific informative prior and updating the informative total variability matrix using the preliminary source-specific informative prior,wherein the method comprises, upon granting access to the current user to the multimedia system in accordance with the access profile associated with the identifed target set, accessing personal settings associated with the current user in order to provide the current user with individually adjusted access and control of multimedia content from the multimedia system.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for identifying a user among a plurality of users of a multimedia system comprising extracting an i-vector for the speech utterance using total variability modeling, comparing the extracted i-vector with a collection of i-vector sets in order to identify a target set most similar to the extracted i-vector, and granting access to the multimedia system in accordance with an access profile associated with the identified target set. Further, source variation is minimized by, for each speech utterance acquired using a specific data source, re-centering first-order statistics of the speech utterance around the mean of an informative prior associated with the source, and using the co-variance of the informative prior associated with the source when extracting the i-vector for the speech utterance.
23 Citations
20 Claims
-
1. A method for identifying a user among a plurality of users of a multimedia system including one or more devices for providing multimedia content from one or more sources of digital information, the method comprising the steps of:
-
providing a collection of i-vector sets, each i-vector set including i-vectors based on one or more words spoken by a user of the multimedia system and being associated with an access profile of this user, acquiring a speech utterance from a current user, and extracting an i-vector for the speech utterance using total variability modeling, comparing the extracted i-vector with each i-vector set in the collection, in order to identify a target set most similar to the extracted i-vector, granting, to the current user, access to the multimedia system in accordance with the access profile associated with the identified target set, the access profile governing access to one or more components of the multimedia system, wherein the speech utterance is acquired using one of a plurality of sources, and wherein the method further comprises minimizing source variation in the total variability modeling by; for each data source, estimating a source-specific informative prior, which is defined by a mean and a covariance, and for each speech utterance acquired using a specific data source, re-centering first-order statistics of the speech utterance around the mean of the informative prior associated with the source, and using the co-variance of the informative prior associated with the source when extracting the i-vector for the speech utterance, wherein estimating a source-specific informative prior includes extracting a source specific set of i-vectors from data acquired from the data source, and using the source specific set of i-vectors to estimate the source-specific informative prior, wherein extracting a source specific set of i-vectors is done using an informative total variability matrix and a non-informative prior, wherein the informative total variability matrix is computed by performing a plurality of training iterations, e.g. expectation maximization training iterations, each iteration including computing a preliminary source-specific informative prior and updating the informative total variability matrix using the preliminary source-specific informative prior, wherein the method comprises, upon granting access to the current user to the multimedia system in accordance with the access profile associated with the identifed target set, accessing personal settings associated with the current user in order to provide the current user with individually adjusted access and control of multimedia content from the multimedia system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A multimedia system comprising:
-
one or more sources of digital information, one or more devices for providing multimedia content from the sources, a database storing a collection of i-vector sets, each i-vector set including i-vectors based on one or more words spoken by a user of the multimedia system and being associated with an access profile of this user, a plurality of speech recording data sources, processing circuitry configured to; extract an i-vector for a speech utterance acquired from one of said data sources using total variability modeling, while minimizing source variation by; for each data source, estimating a source-specific informative prior, which is defined by a mean and a covariance, and for each speech utterance acquired using a specific data source, re-centering first-order statistics of the speech utterance around the mean of the informative prior associated with the source, and using the co-variance of the informative prior associated with the source when extracting the i-vector for the speech utterance, compare the extracted i-vector with each i-vector set in the collection, in order to identify a target set most similar to the extracted i-vector, and grant, to the current user, access to the multimedia system in accordance with the access profile associated with the identified target set; wherein estimating a source-specific informative prior includes extracting a source specific set of i-vectors from data acquired from the data source, and using the source specific set of i-vectors to estimate the source-specific informative prior, wherein extracting a source specific set of i-vectors is done using an informative total variability matrix and a non-informative prior, wherein the informative total variability matrix is computed by performing a plurality of training iterations, e.g. expectation maximization training iterations, each iteration including computing a preliminary source-specific informative prior and updating the informative total variability matrix using the preliminary source-specific informative prior, wherein the processing circuitry is configured to access personal settings associated with the current user in order to provide individually adjusted access and control of multimedia content from the multimedia system. - View Dependent Claims (18, 19, 20)
-
Specification