Speaker verification using co-location information

US 9,792,914 B2
Filed: 07/05/2016
Issued: 10/17/2017
Est. Priority Date: 07/18/2014
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

obtaining, by a first computing device that is configured to respond to voice commands while in a locked state upon receipt of a particular, predefined hotword, a value for a setting that indicates that the first computing device is permitted to provide speaker verification data to other computing devices;

receiving, by the first computing device, audio data that corresponds to an utterance of a voice command that is preceded by the particular, predefined hotword, the audio data being received while the first computing device is in a locked state and is co-located with a second computing device that is also configured to respond to voice commands that are preceded by the particular, predefined hotword;

while the first computing device is in the locked state, and based on the obtained value for the setting that indicates that the first computing device is permitted to share speaker verification data with other computing devices, transmitting, by the first computing device, a message to the second computing device that (i) is co-located with the first computing device and (ii) is configured to respond to voice commands that are preceded by the particular, predefined hotword; and

determining, by the first computing device, to remain in the locked state and not respond to the voice command despite receiving the audio data that corresponds to the utterance of the voice command that is preceded by the particular, predefined hotword.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying a user in a multi-user environment. One of the methods includes receiving, by a first user device, an audio signal encoding an utterance, obtaining, by the first user device, a first speaker model for a first user of the first user device, obtaining, by the first user device for a second user of a second user device that is co-located with the first user device, a second speaker model for the second user or a second score that indicates a respective likelihood that the utterance was spoken by the second user, and determining, by the first user device, that the utterance was spoken by the first user using (i) the first speaker model and the second speaker model or (ii) the first speaker model and the second score.

Citations

39 Claims

1. A computer-implemented method comprising:
- obtaining, by a first computing device that is configured to respond to voice commands while in a locked state upon receipt of a particular, predefined hotword, a value for a setting that indicates that the first computing device is permitted to provide speaker verification data to other computing devices;
  
  receiving, by the first computing device, audio data that corresponds to an utterance of a voice command that is preceded by the particular, predefined hotword, the audio data being received while the first computing device is in a locked state and is co-located with a second computing device that is also configured to respond to voice commands that are preceded by the particular, predefined hotword;
  
  while the first computing device is in the locked state, and based on the obtained value for the setting that indicates that the first computing device is permitted to share speaker verification data with other computing devices, transmitting, by the first computing device, a message to the second computing device that (i) is co-located with the first computing device and (ii) is configured to respond to voice commands that are preceded by the particular, predefined hotword; and
  
  determining, by the first computing device, to remain in the locked state and not respond to the voice command despite receiving the audio data that corresponds to the utterance of the voice command that is preceded by the particular, predefined hotword.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, wherein transmitting the message to the second computing device comprises transmitting, by the first computing device to the second computing device, a message that includes a speaker verification model for a user of the first computing device.
  - 3. The method of claim 1, wherein transmitting the message to the second computing device comprises transmitting, by the first computing device to the second computing device, a message that includes a speaker verification score that represents a likelihood a user of the first computing device spoke the utterance.
  - 4. The method of claim 1, wherein transmitting the message to the second computing device is responsive to receiving the audio data that corresponds to the utterance.
  - 5. The method of claim 1, wherein transmitting the message to the second computing device comprises transmitting, by the first computing device, the message to the second computing device using a short-range communication protocol.
  - 6. The method of claim 1, comprising:
    - determining, by the first computing device, that the second computing device is co-located with the first computing device, wherein transmitting the message to the second computing device is responsive to determining that the second computing device is co-located with the first computing device.
  - 7. The method of claim 1, comprising:
    - receiving, by the first computing device and from the second computing device, data representing a user of the second computing device, wherein determining to remain in the locked state and not respond to the voice command comprises determining, by the first computing device, to remain in the locked state and not respond to the voice command despite receiving the audio data that corresponds to the utterance of the voice command that is preceded by the particular, predefined hotword using the data representing the user of the second computing device.
  - 8. The method of claim 7, wherein receiving the data representing the user of the second computing device comprising receiving, by the first computing device and from the second computing device, a speaker verification model for the user of the second computing device.
  - 9. The method of claim 7, wherein receiving the data representing the user of the second computing device comprising receiving, by the first computing device and from the second computing device, a speaker verification score that represents a likelihood the user of the second computing device spoke the utterance.
  - 10. The method of claim 1, comprising:
    - generating, by the first computing device using a speaker verification model for a user of the first computing device, a speaker verification score that represents a likelihood the user of the first computing device spoke the utterance, wherein determining to remain in the locked state and not respond to the voice command comprises determining, by the first computing device, to remain in the locked state and not respond to the voice command despite receiving the audio data that corresponds to the utterance of the voice command that is preceded by the particular, predefined hotword using the speaker verification score that represents a likelihood the user of the first computing device spoke the utterance.
  - 11. The method of claim 1, comprising:
    - determining, by the first computing device, one or more speaker models that are each stored on the first computing device and for a person other than a user of the first computing device, wherein determining to remain in the locked state and not respond to the voice command comprises determining, by the first computing device, to remain in the locked state and not respond to the voice command despite receiving the audio data that corresponds to the utterance of the voice command that is preceded by the particular, predefined hotword using the one or more speaker models that are each stored on the first computing device and for a person other than the user of the first computing device.
  - 12. The method of claim 11, comprising:
    - obtaining, by the first computing device, user input identifying data for the one or more speaker models that are each stored on the first computing device and for a person other than a user of the first computing device.
  - 13. The method of claim 11, comprising:
    - determining, by the first computing device for a third computing device, a frequency with which the third computing device is located in a physical area near a physical location of the first computing device;
      
      determining, by the first computing device, whether the frequency satisfies a threshold frequency; and
      
      associating, by the first computing device, a particular speaker model specific to a particular user of the third computing device with the first computing device in response to determining that the frequency satisfies the threshold frequency.

14. A system comprising:
- a first computing device that is configured to respond to voice commands while in a locked state upon receipt of a particular, predefined hotword and one or more storage devices storing instructions that are operable, when executed by the first computing device, to cause the first computing device to perform operations comprising;
  
  obtaining a value for a setting that indicates that the first computing device is permitted to provide speaker verification data to other computing devices;
  
  receiving audio data that corresponds to an utterance of a voice command that is preceded by the particular, predefined hotword, the audio data being received while the first computing device is in a locked state and is co-located with a second computing device that is also configured to respond to voice commands that are preceded by the particular, predefined hotword;
  
  while the first computing device is in the locked state, and based on the obtained value for the setting that indicates that the first computing device is permitted to share speaker verification data with other computing devices, transmitting a message to the second computing device that (i) is co-located with the first computing device and (ii) is configured to respond to voice commands that are preceded by the particular, predefined hotword; and
  
  determining to remain in the locked state and not respond to the voice command despite receiving the audio data that corresponds to the utterance of the voice command that is preceded by the particular, predefined hotword.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
- - 15. The system of claim 14, wherein transmitting the message to the second computing device comprises transmitting, by the first computing device to the second computing device, a message that includes a speaker verification model for a user of the first computing device.
  - 16. The system of claim 14, wherein transmitting the message to the second computing device comprises transmitting, by the first computing device to the second computing device, a message that includes a speaker verification score that represents a likelihood a user of the first computing device spoke the utterance.
  - 17. The system of claim 14, wherein transmitting the message to the second computing device is responsive to receiving the audio data that corresponds to the utterance.
  - 18. The system of claim 14, wherein transmitting the message to the second computing device comprises transmitting, by the first computing device, the message to the second computing device using a short-range communication protocol.
  - 19. The system of claim 14, the operations comprising:
    - determining, by the first computing device, that the second computing device is co-located with the first computing device, wherein transmitting the message to the second computing device is responsive to determining that the second computing device is co-located with the first computing device.
  - 20. The system of claim 14, the operations comprising:
    - receiving, by the first computing device and from the second computing device, data representing a user of the second computing device, wherein determining to remain in the locked state and not respond to the voice command comprises determining, by the first computing device, to remain in the locked state and not respond to the voice command despite receiving the audio data that corresponds to the utterance of the voice command that is preceded by the particular, predefined hotword using the data representing the user of the second computing device.
  - 21. The system of claim 20, wherein receiving the data representing the user of the second computing device comprising receiving, by the first computing device and from the second computing device, a speaker verification model for the user of the second computing device.
  - 22. The system of claim 20, wherein receiving the data representing the user of the second computing device comprising receiving, by the first computing device and from the second computing device, a speaker verification score that represents a likelihood the user of the second computing device spoke the utterance.
  - 23. The system of claim 14, the operations comprising:
    - generating, by the first computing device using a speaker verification model for a user of the first computing device, a speaker verification score that represents a likelihood the user of the first computing device spoke the utterance, wherein determining to remain in the locked state and not respond to the voice command comprises determining, by the first computing device, to remain in the locked state and not respond to the voice command despite receiving the audio data that corresponds to the utterance of the voice command that is preceded by the particular, predefined hotword using the speaker verification score that represents a likelihood the user of the first computing device spoke the utterance.
  - 24. The system of claim 14, the operations comprising:
    - determining, by the first computing device, one or more speaker models that are each stored on the first computing device and for a person other than a user of the first computing device, wherein determining to remain in the locked state and not respond to the voice command comprises determining, by the first computing device, to remain in the locked state and not respond to the voice command despite receiving the audio data that corresponds to the utterance of the voice command that is preceded by the particular, predefined hotword using the one or more speaker models that are each stored on the first computing device and for a person other than the user of the first computing device.
  - 25. The system of claim 24, the operations comprising:
    - obtaining, by the first computing device, user input identifying data for the one or more speaker models that are each stored on the first computing device and for a person other than a user of the first computing device.
  - 26. The system of claim 24, the operations comprising:
    - determining, by the first computing device for a third computing device, a frequency with which the third computing device is located in a physical area near a physical location of the first computing device;
      
      determining, by the first computing device, whether the frequency satisfies a threshold frequency; and
      
      associating, by the first computing device, a particular speaker model specific to a particular user of the third computing device with the first computing device in response to determining that the frequency satisfies the threshold frequency.

27. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- obtaining, by a first computing device that is configured to respond to voice commands while in a locked state upon receipt of a particular, predefined hotword, a value for a setting that indicates that the first computing device is permitted to provide speaker verification data to other computing devices;
  
  receiving, by the first computing device, audio data that corresponds to an utterance of a voice command that is preceded by the particular, predefined hotword, the audio data being received while the first computing device is in a locked state and is co-located with a second computing device that is also configured to respond to voice commands that are preceded by the particular, predefined hotword;
  
  while the first computing device is in the locked state, and based on the obtained value for the setting that indicates that the first computing device is permitted to share speaker verification data with other computing devices, transmitting, by the first computing device, a message to the second computing device that (i) is co-located with the first computing device and (ii) is configured to respond to voice commands that are preceded by the particular, predefined hotword; and
  
  determining, by the first computing device, to remain in the locked state and not respond to the voice command despite receiving the audio data that corresponds to the utterance of the voice command that is preceded by the particular, predefined hotword.
- View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
- - 28. The computer-readable medium of claim 27, wherein transmitting the message to the second computing device comprises transmitting, by the first computing device to the second computing device, a message that includes a speaker verification model for a user of the first computing device.
  - 29. The computer-readable medium of claim 27, wherein transmitting the message to the second computing device comprises transmitting, by the first computing device to the second computing device, a message that includes a speaker verification score that represents a likelihood a user of the first computing device spoke the utterance.
  - 30. The computer-readable medium of claim 27, wherein transmitting the message to the second computing device is responsive to receiving the audio data that corresponds to the utterance.
  - 31. The computer-readable medium of claim 27, wherein transmitting the message to the second computing device comprises transmitting, by the first computing device, the message to the second computing device using a short-range communication protocol.
  - 32. The computer-readable medium of claim 27, the operations comprising:
    - determining, by the first computing device, that the second computing device is co-located with the first computing device, wherein transmitting the message to the second computing device is responsive to determining that the second computing device is co-located with the first computing device.
  - 33. The computer-readable medium of claim 27, the operations comprising:
    - receiving, by the first computing device and from the second computing device, data representing a user of the second computing device, wherein determining to remain in the locked state and not respond to the voice command comprises determining, by the first computing device, to remain in the locked state and not respond to the voice command despite receiving the audio data that corresponds to the utterance of the voice command that is preceded by the particular, predefined hotword using the data representing the user of the second computing device.
  - 34. The computer-readable medium of claim 33, wherein receiving the data representing the user of the second computing device comprising receiving, by the first computing device and from the second computing device, a speaker verification model for the user of the second computing device.
  - 35. The computer-readable medium of claim 33, wherein receiving the data representing the user of the second computing device comprising receiving, by the first computing device and from the second computing device, a speaker verification score that represents a likelihood the user of the second computing device spoke the utterance.
  - 36. The computer-readable medium of claim 27, the operations comprising:
    - generating, by the first computing device using a speaker verification model for a user of the first computing device, a speaker verification score that represents a likelihood the user of the first computing device spoke the utterance, wherein determining to remain in the locked state and not respond to the voice command comprises determining, by the first computing device, to remain in the locked state and not respond to the voice command despite receiving the audio data that corresponds to the utterance of the voice command that is preceded by the particular, predefined hotword using the speaker verification score that represents a likelihood the user of the first computing device spoke the utterance.
  - 37. The computer-readable medium of claim 27, the operations comprising:
    - determining, by the first computing device, one or more speaker models that are each stored on the first computing device and for a person other than a user of the first computing device, wherein determining to remain in the locked state and not respond to the voice command comprises determining, by the first computing device, to remain in the locked state and not respond to the voice command despite receiving the audio data that corresponds to the utterance of the voice command that is preceded by the particular, predefined hotword using the one or more speaker models that are each stored on the first computing device and for a person other than the user of the first computing device.
  - 38. The computer-readable medium of claim 37, the operations comprising:
    - obtaining, by the first computing device, user input identifying data for the one or more speaker models that are each stored on the first computing device and for a person other than a user of the first computing device.
  - 39. The computer-readable medium of claim 37, the operations comprising:
    - determining, by the first computing device for a third computing device, a frequency with which the third computing device is located in a physical area near a physical location of the first computing device;
      
      determining, by the first computing device, whether the frequency satisfies a threshold frequency; and
      
      associating, by the first computing device, a particular speaker model specific to a particular user of the third computing device with the first computing device in response to determining that the frequency satisfies the threshold frequency.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Alvarez Guevara, Raziel, Hansson, Othar
Primary Examiner(s)
VO, HUYEN X

Application Number

US15/201,972
Publication Number

US 20160314792A1
Time in Patent Office

469 Days
Field of Search

704233, 704231, 704235, 704251, 704255, 704257, 704270, 7042701
US Class Current
CPC Class Codes

G06F 21/32   using biometric data, e.g. ...

G06F 2221/2111   Location-sensitive, e.g. ge...

G10L 15/08   Speech classification or se...

G10L 15/18   using natural language mode...

G10L 17/00   Speaker identification or v...

G10L 17/20   Pattern transformations or ...

G10L 17/22   Interactive procedures; Man...

G10L 17/24   the user being prompted to ...

G10L 19/00   Speech or audio signals ana...

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

H04L 63/0861   using biometrical features,...

H04W 12/06   Authentication

Speaker verification using co-location information

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

39 Claims

Specification

Solutions

Use Cases

Quick Links

Speaker verification using co-location information

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

39 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links