Speaker recognition in multimedia system

US 10,354,657 B2
Filed: 02/10/2016
Issued: 07/16/2019
Est. Priority Date: 02/11/2015
Status: Active Grant

First Claim

Patent Images

1. A method for identifying a user among a plurality of users of a multimedia system including one or more devices for providing multimedia content from one or more sources of digital information, the method comprising the steps of:

providing a collection of i-vector sets, each i-vector set including i-vectors based on one or more words spoken by a user of the multimedia system and being associated with an access profile of this user,acquiring a speech utterance from a current user, and extracting an i-vector for the speech utterance using total variability modeling,comparing the extracted i-vector with each i-vector set in the collection, in order to identify a target set most similar to the extracted i-vector,granting, to the current user, access to the multimedia system in accordance with the access profile associated with the identified target set, the access profile governing access to one or more components of the multimedia system,wherein the speech utterance is acquired using one of a plurality of sources, and wherein the method further comprises minimizing source variation in the total variability modeling by;

for each data source, estimating a source-specific informative prior, which is defined by a mean and a covariance, andfor each speech utterance acquired using a specific data source, re-centering first-order statistics of the speech utterance around the mean of the informative prior associated with the source, and using the co-variance of the informative prior associated with the source when extracting the i-vector for the speech utterance,wherein estimating a source-specific informative prior includes extracting a source specific set of i-vectors from data acquired from the data source, and using the source specific set of i-vectors to estimate the source-specific informative prior,wherein extracting a source specific set of i-vectors is done using an informative total variability matrix and a non-informative prior,wherein the informative total variability matrix is computed by performing a plurality of training iterations, e.g. expectation maximization training iterations, each iteration including computing a preliminary source-specific informative prior and updating the informative total variability matrix using the preliminary source-specific informative prior,wherein the method comprises, upon granting access to the current user to the multimedia system in accordance with the access profile associated with the identifed target set, accessing personal settings associated with the current user in order to provide the current user with individually adjusted access and control of multimedia content from the multimedia system.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for identifying a user among a plurality of users of a multimedia system comprising extracting an i-vector for the speech utterance using total variability modeling, comparing the extracted i-vector with a collection of i-vector sets in order to identify a target set most similar to the extracted i-vector, and granting access to the multimedia system in accordance with an access profile associated with the identified target set. Further, source variation is minimized by, for each speech utterance acquired using a specific data source, re-centering first-order statistics of the speech utterance around the mean of an informative prior associated with the source, and using the co-variance of the informative prior associated with the source when extracting the i-vector for the speech utterance.

23 Citations

View as Search Results

20 Claims

1. A method for identifying a user among a plurality of users of a multimedia system including one or more devices for providing multimedia content from one or more sources of digital information, the method comprising the steps of:
- providing a collection of i-vector sets, each i-vector set including i-vectors based on one or more words spoken by a user of the multimedia system and being associated with an access profile of this user,acquiring a speech utterance from a current user, and extracting an i-vector for the speech utterance using total variability modeling,comparing the extracted i-vector with each i-vector set in the collection, in order to identify a target set most similar to the extracted i-vector,granting, to the current user, access to the multimedia system in accordance with the access profile associated with the identified target set, the access profile governing access to one or more components of the multimedia system,wherein the speech utterance is acquired using one of a plurality of sources, and wherein the method further comprises minimizing source variation in the total variability modeling by;
  
  for each data source, estimating a source-specific informative prior, which is defined by a mean and a covariance, andfor each speech utterance acquired using a specific data source, re-centering first-order statistics of the speech utterance around the mean of the informative prior associated with the source, and using the co-variance of the informative prior associated with the source when extracting the i-vector for the speech utterance,wherein estimating a source-specific informative prior includes extracting a source specific set of i-vectors from data acquired from the data source, and using the source specific set of i-vectors to estimate the source-specific informative prior,wherein extracting a source specific set of i-vectors is done using an informative total variability matrix and a non-informative prior,wherein the informative total variability matrix is computed by performing a plurality of training iterations, e.g. expectation maximization training iterations, each iteration including computing a preliminary source-specific informative prior and updating the informative total variability matrix using the preliminary source-specific informative prior,wherein the method comprises, upon granting access to the current user to the multimedia system in accordance with the access profile associated with the identifed target set, accessing personal settings associated with the current user in order to provide the current user with individually adjusted access and control of multimedia content from the multimedia system.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The method according to claim 1, wherein extracting a source specific set of i-vectors is done using a pre-trained total variability matrix and a non-informative prior.
  - 3. The method according to claim 1, further comprising storing the collection of i-vector sets and associated access profiles in a remote database and making them accessible to more than one multimedia system.
  - 4. The method according to claim 3, further comprising storing content consumption patterns of each user and providing the current user with recommendations based on choices of other users with similar choices as the current user.
  - 5. The method according to claim 1, further comprising:
    - providing a collection of i-vector classes, each i-vector class including a set of i-vectors based on speech from users having similar characteristics, andcomparing the extracted i-vector with each i-vector class to identify an i-vector class most similar to the extracted i-vector.
  - 6. The method according to claim 5, wherein the characteristics include at least one of age, gender, and mood.
  - 7. The method according to claim 1, further including identifying and registering a new user only if an i-vector extracted from a speech utterance of the new user is sufficiently different from all previously stored i-vectors according to a predefined condition.
  - 8. The method according claim 7, wherein the condition is based on a cosine distance between the extracted i-vector and all previously stored-i-vectors.
  - 9. The method according to claim 1, wherein the collection of i-vector sets includes a first i-vector set based one or more words spoken by a first user and associated with a first access profile, and a second i-vector set based on one or more words spoken by a second user and associated with a second access profile, and further comprising:
    - allocating a first user identification to the first user;
      
      allocating a second user identification to the second user;
      
      identifying the first user as the current user;
      
      receiving input from the first user indicating the second user identification; and
      
      granting the first user access in accordance with the second access profile.
  - 10. The method according to claim 9, wherein each access profile defines user dependent access rights.
  - 11. The method according to claim 9, wherein each user identification is allocated to a function key, such as a button on a physical device or a graphical image/icon on a virtual device.
  - 12. The method according to claim 11, wherein said database is remote to said multimedia system, and shared by several multimedia systems.
  - 13. The method of claim 1, wherein the personal settings and related functional capabilities in the multimedia system include one or more of:
    - sound preferences in a room or domain or part thereof, the sound preferences including rendering type such as multichannel, stereo, and/or omnidirectional sound space, default volume, and default filter settings for bass, treble, and balance;
      
      media source and rendering preferences, the media source and rendering preferences including channel ID and/or room/domain ID;
      
      sharing options, the sharing options including private, share per room/domain, and/or share per use; and
      
      /oruse pattern recording, the use pattern recording including personal, all, per user, and/or per room/domain.
  - 14. The method of claim 1, wherein the method comprises recording and/or managing data per user including one or more of:
    - user pattern play recording, the use pattern recording including personal, all, per user, and/or per room/domain;
      
      pattern play recording, the pattern play recording including recording user behavior over time such as which media is active, where is the media active, and length of time the media was active;
      
      sound preferences in a room or domain or part thereof, the sound preferences including rendering type such as multichannel, stereo, and/or omnidirectional sound space, default volume, and default filter settings for bass, treble, and balance;
      
      media source and rendering preferences, the media source and rendering preferences including channel ID and/or room/domain ID; and
      
      sharing options/preferences, the sharing options/preferences including private, share per room/domain, and/or share per user.
  - 15. The method of claim 1, wherein the granting comprises providing multimedia information specifically to an individual user with a relevant subset of the multimedia information presented on destination devices and a remote device relevant to the user, where access to the devices is governed by the user'"'"'s access profile.
  - 16. The method of claim 1, wherein the granting comprises providing multimedia information specifically to an individual user with a relevant subset of the multimedia information presented on destination devices relevant to the user, where access to the destination devices is governed by the user'"'"'s access profile.

17. A multimedia system comprising:
- one or more sources of digital information,one or more devices for providing multimedia content from the sources,a database storing a collection of i-vector sets, each i-vector set including i-vectors based on one or more words spoken by a user of the multimedia system and being associated with an access profile of this user,a plurality of speech recording data sources,processing circuitry configured to;
  
  extract an i-vector for a speech utterance acquired from one of said data sources using total variability modeling, while minimizing source variation by;
  
  for each data source, estimating a source-specific informative prior, which is defined by a mean and a covariance, andfor each speech utterance acquired using a specific data source, re-centering first-order statistics of the speech utterance around the mean of the informative prior associated with the source, and using the co-variance of the informative prior associated with the source when extracting the i-vector for the speech utterance,compare the extracted i-vector with each i-vector set in the collection, in order to identify a target set most similar to the extracted i-vector, andgrant, to the current user, access to the multimedia system in accordance with the access profile associated with the identified target set;
  
  wherein estimating a source-specific informative prior includes extracting a source specific set of i-vectors from data acquired from the data source, and using the source specific set of i-vectors to estimate the source-specific informative prior,wherein extracting a source specific set of i-vectors is done using an informative total variability matrix and a non-informative prior,wherein the informative total variability matrix is computed by performing a plurality of training iterations, e.g. expectation maximization training iterations, each iteration including computing a preliminary source-specific informative prior and updating the informative total variability matrix using the preliminary source-specific informative prior,wherein the processing circuitry is configured to access personal settings associated with the current user in order to provide individually adjusted access and control of multimedia content from the multimedia system.
- View Dependent Claims (18, 19, 20)
- - 18. The system of claim 17, wherein the personal settings and related functional capabilities in the multimedia system include one or more of:
    - sound preferences in a room or domain or part thereof, the sound preferences including rendering type such as multichannel, stereo, and/or omnidirectional sound space, default volume, and default filter settings for bass, treble, and balance;
      
      media source and rendering preferences, the media source and rendering preferences including channel ID and/or room/domain ID;
      
      sharing options, the sharing options including private, share per room/domain, and/or share per user; and
      
      /oruse pattern recording, the use pattern recording including personal, all, per user, and/or per room/domain.
  - 19. The system of claim 17, wherein the processing circuitry is configured to, in connection with the granting, provide multimedia information specifically to an individual user with a relevant subset of the multimedia information presented on destination devices and a remote device relevant to the user, where access to the destination devices and the remote device is governed by the user'"'"'s access profile.
  - 20. The system of claim 17, wherein the processing circuitry is configured to, in connection with the granting, provide multimedia information specifically to an individual user with a relevant subset of the multimedia information presented on destination devices relevant to the user, where access to the destination devices is governed by the user'"'"'s access profile.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Bang & Olufsen a/s
Original Assignee
Bang & Olufsen a/s
Inventors
Shepstone, Sven Ewan, Borup Jensen, Soren
Primary Examiner(s)
Adesanya, Olujimi A

Application Number

US15/540,647
Publication Number

US 20170372706A1
Time in Patent Office

1,252 Days
Field of Search
US Class Current
CPC Class Codes

G10L 17/00   Speaker identification or v...

G10L 17/02   Preprocessing operations, e...

G10L 17/04   Training, enrolment or mode...

G10L 17/06   Decision making techniques;...

G10L 17/22   Interactive procedures; Man...

Speaker recognition in multimedia system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

23 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speaker recognition in multimedia system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

23 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links