Speaker verification using co-location information
First Claim
1. A computer-implemented method comprising:
- obtaining, by a first user device, a first score that indicates a likelihood that an utterance encoded in an audio signal that is generated by the first user device was spoken by a first user of the first user device;
obtaining, by the first user device from another device that is a different device than the first user device and for a second user device that is co-located with the first user device, a second score that indicates a likelihood that the utterance was spoken by a second user that is associated with the second user device;
determining, by the first user device, that the utterance was likely spoken by the first user based at least on (i) the first score that indicates a likelihood that the utterance was spoken by the first user of the first user device, and (ii) the second score that indicates a likelihood that the utterance was spoken by the second user that is associated with the second user device; and
performing an action that corresponds with a spoken command encoded in the audio signal in response to determining that the utterance was likely spoken by the first user.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying a user in a multi-user environment. One of the methods includes receiving, by a first user device, an audio signal encoding an utterance, obtaining, by the first user device, a first speaker model for a first user of the first user device, obtaining, by the first user device for a second user of a second user device that is co-located with the first user device, a second speaker model for the second user or a second score that indicates a respective likelihood that the utterance was spoken by the second user, and determining, by the first user device, that the utterance was spoken by the first user using (i) the first speaker model and the second speaker model or (ii) the first speaker model and the second score.
20 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
obtaining, by a first user device, a first score that indicates a likelihood that an utterance encoded in an audio signal that is generated by the first user device was spoken by a first user of the first user device; obtaining, by the first user device from another device that is a different device than the first user device and for a second user device that is co-located with the first user device, a second score that indicates a likelihood that the utterance was spoken by a second user that is associated with the second user device; determining, by the first user device, that the utterance was likely spoken by the first user based at least on (i) the first score that indicates a likelihood that the utterance was spoken by the first user of the first user device, and (ii) the second score that indicates a likelihood that the utterance was spoken by the second user that is associated with the second user device; and performing an action that corresponds with a spoken command encoded in the audio signal in response to determining that the utterance was likely spoken by the first user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer-implemented method comprising:
-
receiving, by a first user device, an audio signal that is generated by the first user device and that encodes an utterance; obtaining, by the first user device, a first score that indicates a respective likelihood that the utterance was spoken by a first user of the first user device; determining, by the first user device for a second user of a corresponding second user device, that the second user device is co-located with the first user device; identifying, by the first user device, one or more third speaker models associated with the first user device; and determining, by the first user device, that a subset of the third speaker models comprises a second speaker model for the second user in response to (i) determining that the second user device is co-located with the first user device and (ii) receiving the audio signal that is generated by the first user device and that encodes the utterance; determining, by the first user device, that the utterance was spoken by the first user using the first score and the second speaker model; and performing an action that corresponds with a spoken command encoded in the audio signal in response to determining that the utterance was spoken by the first user. - View Dependent Claims (16, 17, 18)
-
-
19. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; obtaining, by a first user device, a first score that indicates a likelihood that an utterance encoded in an audio signal that is generated by the first user device was spoken by a first user of the first user device; obtaining, by the first user device from another device that is a different device than the first user device and for a second user device that is co-located with the first user device, a second score that indicates a respective likelihood that the utterance was spoken by a second user that is associated with the second user device; determining, by the first user device, that the utterance was likely spoken by the first user based at least on (i) the first score that indicates a likelihood that the utterance was spoken by the first user of the first user device, and (ii) the second score that indicates a likelihood that the utterance was spoken by the second user that is associated with the second user device; and performing an action that corresponds with a spoken command encoded in the audio signal in response to determining that the utterance was likely spoken by the first user.
-
20. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
obtaining, by a first user device, a first score that indicates a likelihood that an utterance encoded in an audio signal that is generated by the first user device was spoken by a first user of the first user device; obtaining, by the first user device from another device that is a different device than the first user device and for a second user device that is co-located with the first user device, a second score that indicates a likelihood that the utterance was spoken by a second user that is associated with the second user device; determining, by the first user device, that the utterance was likely spoken by the first user based at least on (i) the first score that indicates a likelihood that the utterance was spoken by the first user of the first user device, and (ii) the second score that indicates a likelihood that the utterance was spoken by the second user that is associated with the second user device; and performing an action that corresponds with a spoken command encoded in the audio signal in response to determining that the utterance was likely spoken by the first user.
-
Specification