Speaker verification using co-location information

US 9,412,376 B2
Filed: 07/22/2015
Issued: 08/09/2016
Est. Priority Date: 07/18/2014
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

obtaining, by a first user device, a first score that indicates a likelihood that an utterance encoded in an audio signal that is generated by the first user device was spoken by a first user of the first user device;

obtaining, by the first user device from another device that is a different device than the first user device and for a second user device that is co-located with the first user device, a second score that indicates a likelihood that the utterance was spoken by a second user that is associated with the second user device;

determining, by the first user device, that the utterance was likely spoken by the first user based at least on (i) the first score that indicates a likelihood that the utterance was spoken by the first user of the first user device, and (ii) the second score that indicates a likelihood that the utterance was spoken by the second user that is associated with the second user device; and

performing an action that corresponds with a spoken command encoded in the audio signal in response to determining that the utterance was likely spoken by the first user.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying a user in a multi-user environment. One of the methods includes receiving, by a first user device, an audio signal encoding an utterance, obtaining, by the first user device, a first speaker model for a first user of the first user device, obtaining, by the first user device for a second user of a second user device that is co-located with the first user device, a second speaker model for the second user or a second score that indicates a respective likelihood that the utterance was spoken by the second user, and determining, by the first user device, that the utterance was spoken by the first user using (i) the first speaker model and the second speaker model or (ii) the first speaker model and the second score.

20 Citations

View as Search Results

20 Claims

1. A computer-implemented method comprising:
- obtaining, by a first user device, a first score that indicates a likelihood that an utterance encoded in an audio signal that is generated by the first user device was spoken by a first user of the first user device;
  
  obtaining, by the first user device from another device that is a different device than the first user device and for a second user device that is co-located with the first user device, a second score that indicates a likelihood that the utterance was spoken by a second user that is associated with the second user device;
  
  determining, by the first user device, that the utterance was likely spoken by the first user based at least on (i) the first score that indicates a likelihood that the utterance was spoken by the first user of the first user device, and (ii) the second score that indicates a likelihood that the utterance was spoken by the second user that is associated with the second user device; and
  
  performing an action that corresponds with a spoken command encoded in the audio signal in response to determining that the utterance was likely spoken by the first user.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1 comprising determining, by the first user device, that the second user device is co-located with the first user device.
  - 3. The method of claim 2 wherein determining, by the first user device, that the second user device is co-located with the first user device comprises determining, by the first user device, that the second user device is co-located in a physical area near a physical location of the first user device.
  - 4. The method of claim 2 comprising:
    - determining, by the first user device, whether the first user device has one or more settings that allow the first user device access to the second score that indicates a likelihood that the utterance was spoken by the second user that is associated with the second user device in response to determining that the second user device is co-located with the first user device,wherein obtaining, by the first user device, the second score that indicates a likelihood that the utterance was spoken by the second user that is associated with the second user device comprises obtaining, by the first user device, the second score that indicates a likelihood that the utterance was spoken by the second user that is associated with the second user device in response to determining that the first user device has one or more settings that allow the first user device access to the second score.
  - 5. The method of claim 1 comprising:
    - generating, by the first user device, the first score that indicates a likelihood that the utterance was spoken by the first user using a portion of the audio signal and a first speaker model that is specific to the first user.
  - 6. The method of claim 1 comprising:
    - comparing the first score with the second score to determine a highest score, wherein determining that the utterance was likely spoken by the first user comprises determining that the first score is the highest score.
  - 7. The method of claim 1 wherein receiving, by the first user device from another device that is a different device than the first user device and for the second user device that is co-located with the first user device, the second score that indicates a likelihood that the utterance was spoken by the second user that is associated with the second device comprises receiving the second score from a server.
  - 8. The method of claim 1 wherein receiving, by the first user device from another device that is a different device than the first user device and for the second user device that is co-located with the first user device, the second score that indicates a likelihood that the utterance was spoken by the second user that is associated with the second device comprises receiving the second score from the second user device.
  - 9. The method of claim 1 comprising:
    - determining, by the first user device, one or more third speaker models, associated with the first user device, for other people who may be located in a physical area near a physical location of the first user device; and
      
      determining, by the first user device, that the utterance was likely spoken by the first user based at least on (i) the first score that indicates a likelihood that the utterance was spoken by the first user of the first user device, (ii) the second score that indicates a likelihood that the utterance was spoken by the second user that is associated with the second user device, and (iii) the third speaker models for other people who may be located in a physical area near a physical location of the first user device.
  - 10. The method of claim 9 comprising:
    - generating, by the first user device for each of the third speaker models, a respective third score using the respective third speaker model and a portion of the audio signal; and
      
      comparing, by the first user device, the first score, the second score, and the third scores to determine a highest score.
  - 11. The method of claim 9 comprising:
    - receiving, by the first user device for each of the third speaker models, a respective third score from a server; and
      
      comparing, by the first user device, the first score, the second score, and the third scores to determine a highest score.
  - 12. The method of claim 9 comprising:
    - determining, by the first user device for a third user device, a frequency with which the third user device is located in a physical area near a physical location of the first user device;
      
      determining, by the first user device, whether the frequency satisfies a threshold frequency; and
      
      associating, by the first user device, a third speaker model specific to a third user of the third user device with the first user device in response to determining that the frequency satisfies the threshold frequency.
  - 13. The method of claim 12 wherein associating, by the first user device, the third speaker model specific to the third user of the third user device with the first user device comprises storing the third speaker model in a memory of the first user device.
  - 14. The method of claim 12 wherein associating, by the first user device, the third speaker model specific to the third user of the third user device with the first user device comprises sending, by the first user device to a server, a message indicating that the third speaker model should be associated with the first user device.

15. A computer-implemented method comprising:
- receiving, by a first user device, an audio signal that is generated by the first user device and that encodes an utterance;
  
  obtaining, by the first user device, a first score that indicates a respective likelihood that the utterance was spoken by a first user of the first user device;
  
  determining, by the first user device for a second user of a corresponding second user device, that the second user device is co-located with the first user device;
  
  identifying, by the first user device, one or more third speaker models associated with the first user device; and
  
  determining, by the first user device, that a subset of the third speaker models comprises a second speaker model for the second user in response to (i) determining that the second user device is co-located with the first user device and (ii) receiving the audio signal that is generated by the first user device and that encodes the utterance;
  
  determining, by the first user device, that the utterance was spoken by the first user using the first score and the second speaker model; and
  
  performing an action that corresponds with a spoken command encoded in the audio signal in response to determining that the utterance was spoken by the first user.
- View Dependent Claims (16, 17, 18)
- - 16. The method of claim 15 comprising:
    - removing, by the first user device, the association between the third speaker models not included in the subset of the third speaker models and the first user device.
  - 17. The method of claim 15 comprising:
    - receiving the second speaker model for the second user from a server in response to determining that the subset of the third speaker models comprises the second speaker model for the second user.
  - 18. The method of claim 15 comprising:
    - receiving a second score that indicates a respective likelihood that the utterance was spoken by the second user from a server in response to determining that the subset of the third speaker models comprises the second speaker model for the second user, wherein determining that the utterance was spoken by the first user using the first score and the second speaker model comprises determining that the utterance was spoken by the first user using the first score and the second score.

19. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  obtaining, by a first user device, a first score that indicates a likelihood that an utterance encoded in an audio signal that is generated by the first user device was spoken by a first user of the first user device;
  
  obtaining, by the first user device from another device that is a different device than the first user device and for a second user device that is co-located with the first user device, a second score that indicates a respective likelihood that the utterance was spoken by a second user that is associated with the second user device;
  
  determining, by the first user device, that the utterance was likely spoken by the first user based at least on (i) the first score that indicates a likelihood that the utterance was spoken by the first user of the first user device, and (ii) the second score that indicates a likelihood that the utterance was spoken by the second user that is associated with the second user device; and
  
  performing an action that corresponds with a spoken command encoded in the audio signal in response to determining that the utterance was likely spoken by the first user.

20. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- obtaining, by a first user device, a first score that indicates a likelihood that an utterance encoded in an audio signal that is generated by the first user device was spoken by a first user of the first user device;
  
  obtaining, by the first user device from another device that is a different device than the first user device and for a second user device that is co-located with the first user device, a second score that indicates a likelihood that the utterance was spoken by a second user that is associated with the second user device;
  
  determining, by the first user device, that the utterance was likely spoken by the first user based at least on (i) the first score that indicates a likelihood that the utterance was spoken by the first user of the first user device, and (ii) the second score that indicates a likelihood that the utterance was spoken by the second user that is associated with the second user device; and
  
  performing an action that corresponds with a spoken command encoded in the audio signal in response to determining that the utterance was likely spoken by the first user.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Alvarez Guevara, Raziel, Hansson, Othar
Primary Examiner(s)
VO, HUYEN X

Application Number

US14/805,687
Publication Number

US 20160019889A1
Time in Patent Office

384 Days
Field of Search

704/231, 704/235, 704/230, 704/251, 704/255, 704/257, 704/270, 704/270.1, 704/272, 704/254
US Class Current

1/1
CPC Class Codes

G06F 21/32   using biometric data, e.g. ...

G06F 2221/2111   Location-sensitive, e.g. ge...

G10L 15/08   Speech classification or se...

G10L 15/18   using natural language mode...

G10L 17/00   Speaker identification or v...

G10L 17/20   Pattern transformations or ...

G10L 17/22   Interactive procedures; Man...

G10L 17/24   the user being prompted to ...

G10L 19/00   Speech or audio signals ana...

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

H04L 63/0861   using biometrical features,...

H04W 12/06   Authentication

Speaker verification using co-location information

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

20 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speaker verification using co-location information

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

20 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links