Multi-user authentication on a device

US 10,497,364 B2
Filed: 04/18/2018
Issued: 12/03/2019
Est. Priority Date: 04/20/2017
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

determining, by one or more computers, that an utterance includes a particular user speaking a hotword based at least on a first hotword detection model generated from a first set of samples of the particular user speaking the hotword;

in response to determining that an utterance includes a particular user speaking a hotword based at least on the first hotword detection model generated from the first set of samples of the particular user speaking the hotword, storing at least a portion of the utterance as a new sample;

obtaining a second set of samples of the particular user speaking the utterance, where the second set of samples includes the new sample and less than all the samples in the first set of samples;

determining, by the one or more computers, that a second utterance includes the particular user speaking the hotword based at least on a second hotword detection model generated from the second set of samples of the user speaking the hotword; and

in response to determining, by the one or more computers, that a second utterance includes the particular user speaking the hotword, recognizing the second utterance as having been spoken by the particular user.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In some implementations, an utterance is determined to include a particular user speaking a hotword based at least on a first set of samples of the particular user speaking the hotword. In response to determining that an utterance includes a particular user speaking a hotword based at least on a first set of samples of the particular user speaking the hotword, at least a portion of the utterance is stored as a new sample. A second set of samples of the particular user speaking the utterance is obtained, where the second set of samples includes the new sample and less than all the samples in the first set of samples. A second utterance is determined to include the particular user speaking the hotword based at least on the second set of samples of the user speaking the hotword.

130 Citations

20 Claims

1. A computer-implemented method comprising:
- determining, by one or more computers, that an utterance includes a particular user speaking a hotword based at least on a first hotword detection model generated from a first set of samples of the particular user speaking the hotword;
  
  in response to determining that an utterance includes a particular user speaking a hotword based at least on the first hotword detection model generated from the first set of samples of the particular user speaking the hotword, storing at least a portion of the utterance as a new sample;
  
  obtaining a second set of samples of the particular user speaking the utterance, where the second set of samples includes the new sample and less than all the samples in the first set of samples;
  
  determining, by the one or more computers, that a second utterance includes the particular user speaking the hotword based at least on a second hotword detection model generated from the second set of samples of the user speaking the hotword; and
  
  in response to determining, by the one or more computers, that a second utterance includes the particular user speaking the hotword, recognizing the second utterance as having been spoken by the particular user.
- View Dependent Claims (2, 3, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein obtaining a second set of samples of the particular user speaking the utterance, where the second set of samples includes the new sample and less than all the samples in the first set of samples comprises:
    - selecting a predetermined number of recently stored samples as the second set of samples.
  - 3. The method of claim 1, wherein obtaining a second set of samples of the particular user speaking the utterance, where the second set of samples includes the new sample and less than all the samples in the first set of samples comprises:
    - selecting both a predetermined number of most recently stored samples and a set of reference samples to combine together as the second set of samples.
  - 5. The method of claim 1, comprising:
    - in response to obtaining the second set of samples, deleting a sample in the first set of samples but not in the second set of samples.
  - 6. The method of claim 1, wherein determining that an utterance includes a particular user speaking a hotword based at least on a first hotword detection model generated from a first set of samples of the particular user speaking the hotword comprises:
    - generating the first hotword detection model using the first set of samples;
      
      inputting the utterance into the first hotword detection model; and
      
      determining that the first hotword detection model has classified the utterance as including the particular user speaking the hotword.
  - 7. The method of claim 1, wherein determining that a second utterance includes the particular user speaking the hotword based at least on a second hotword detection model generated from the second set of samples of the user speaking the hotword comprises:
    - generating the second hotword detection model using the second set of samples;
      
      inputting the second utterance into the second hotword detection model; and
      
      determining that the second hotword detection model has classified the second utterance as including the particular user speaking the hotword.
  - 8. The computer-implemented method of claim 1, comprising:
    - receiving a second new sample from a server; and
      
      determining that a third utterance includes the particular user speaking the hotword based at least on a third set of samples that includes the second new sample from the server and less than all the samples in the second set of samples.
  - 9. The method of claim 1, comprising:
    - receiving, from a server, indications of samples in a third set of samples;
      
      determining samples that are in the third set of samples that are not locally stored;
      
      providing a request to server for the samples in the third set of samples that are not locally stored; and
      
      receiving the samples that are not locally stored from the server in response to the request.
  - 10. The method of claim 1, comprising:
    - providing the first set of samples to a voice-enabled device to enable the voice-enabled device to detect whether the particular user says the hotword,wherein determining that an utterance includes a particular user speaking a hotword based at least on the first hotword detection model generated from a first set of samples of the particular user speaking the hotword comprises receiving an indication that the voice-enabled device detected that the particular user said the hotword.
  - 11. The method of claim 1, comprising:
    - generating a hotword detection model using the first set of samples; and
      
      providing the hotword detection model to a voice-enabled device to enable the voice-enabled device to detect whether the particular user says the hotword, wherein determining that an utterance includes a particular user speaking a hotword based at least on a first hotword detection model generated from a first set of samples of the particular user speaking the hotword comprises receiving an indication that the voice-enabled device detected that the particular user said the hotword.
  - 12. The method of claim 1, comprising:
    - receiving, from a voice-enabled device, a request for a current set of samples for detecting whether the particular user said the hotword;
      
      determining samples in the current set of samples that are not locally stored by the voice-enabled device; and
      
      providing, to the voice-enabled device, an indication of the samples in the current set of samples and the samples in the current set of samples that are not locally stored by the voice-enabled device.

4. The method of 3, wherein the reference samples comprise samples from a registration process for the particular user and the most recent stored samples comprise samples from queries spoken by the particular user.

13. A system comprising:
- one or more computers; and
  
  one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  determining, by the one or more computers, that an utterance includes a particular user speaking a hotword based at least on a first hotword detection model generated from a first set of samples of the particular user speaking the hotword;
  
  in response to determining that an utterance includes a particular user speaking a hotword based at least on the first hotword detection model generated from a-the first set of samples of the particular user speaking the hotword, storing at least a portion of the utterance as a new sample;
  
  obtaining a second set of samples of the particular user speaking the utterance, where the second set of samples includes the new sample and less than all the samples in the first set of samples;
  
  determining, by the one or more computers, that a second utterance includes the particular user speaking the hotword based at least on a second hotword detection model generated from the second set of samples of the user speaking the hotword; and
  
  in response to determining, by the one or more computers, that a second utterance includes the particular user speaking the hotword, recognizing the second utterance as having been spoken by the particular user.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The system of claim 13, wherein obtaining a second set of samples of the particular user speaking the utterance, where the second set of samples includes the new sample and less than all the samples in the first set of samples comprises:
    - selecting a predetermined number of recently stored samples as the second set of samples.
  - 15. The system of claim 13, wherein obtaining a second set of samples of the particular user speaking the utterance, where the second set of samples includes the new sample and less than all the samples in the first set of samples comprises:
    - selecting both a predetermined number of most recently stored samples and a set of reference samples to combine together as the second set of samples.
  - 16. The system of claim 15, wherein the reference samples comprise samples from a registration process for the particular user and the most recent stored samples comprise samples from queries spoken by the particular user.
  - 17. The system of claim 13, the operations comprising:
    - in response to obtaining the second set of samples, deleting a sample in the first set of samples but not in the second set of samples.
  - 18. The system of claim 13, wherein determining that an utterance includes a particular user speaking a hotword based at least on a first hotword detection model generated from a first set of samples of the particular user speaking the hotword comprises:
    - generating the first hotword detection model using the first set of samples;
      
      inputting the utterance into the first hotword detection model; and
      
      determining that the first hotword detection model has classified the utterance as including the particular user speaking the hotword.
  - 19. The system of claim 13, wherein determining that a second utterance includes the particular user speaking the hotword based at least on a second hotword detection model generated from the second set of samples of the user speaking the hotword comprises:
    - generating the second hotword detection model using the second set of samples;
      
      inputting the second utterance into the second hotword detection model; and
      
      determining that the second hotword detection model has classified the second utterance as including the particular user speaking the hotword.

20. A non-transitory computer-readable medium storing instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- determining, by one or more computers, that an utterance includes a particular user speaking a hotword based at least on a first hotword detection model generated from a first set of samples of the particular user speaking the hotword;
  
  in response to determining that an utterance includes a particular user speaking a hotword based at least on the first hotword detection model generated from the first set of samples of the particular user speaking the hotword, storing at least a portion of the utterance as a new sample;
  
  obtaining a second set of samples of the particular user speaking the utterance, where the second set of samples includes the new sample and less than all the samples in the first set of samples;
  
  determining, by the one or more computers, that a second utterance includes the particular user speaking the hotword based at least on a second hotword detection model generated from the second set of samples of the user speaking the hotword; and
  
  in response to determining, by the one or more computers, that a second utterance includes the particular user speaking the hotword, recognizing the second utterance as having been spoken by the particular user.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Lopez Moreno, Ignacio, Melendo Casado, Diego
Primary Examiner(s)
McFadden, Susan I

Application Number

US15/956,493
Publication Number

US 20180308472A1
Time in Patent Office

594 Days
Field of Search

704244
US Class Current
CPC Class Codes

G06F 16/636   by using biological or phys...

G06F 21/32   using biometric data, e.g. ...

G06F 3/167   Audio in a user interface, ...

G06V 40/10   Human or animal bodies, e.g...

G10L 15/07   to the speaker

G10L 15/08   Speech classification or se...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 17/00   Speaker identification or v...

G10L 17/06   Decision making techniques;...

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

Multi-user authentication on a device

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

130 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Multi-user authentication on a device

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

130 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links