Securely executing voice actions with speaker identification and authentication input types
First Claim
1. A method performed by a voice action server, the method comprising:
- receiving, by the voice action server, (i) audio data representing a voice command spoken by a speaker and (ii) contextual data from a client device of the speaker, the contextual data indicating a status of the client device and comprising data values representing contextual signals that can authenticate the speaker without requiring the speaker to provide explicit authentication information;
identifying, by the voice action server, the speaker based on the audio data representing the voice command;
selecting, by the voice action server, a voice action based at least on a transcription of the audio data;
selecting, by the voice action server, a third-party service provider from among a plurality of different third-party service providers, wherein the third-party service provider is selected by obtaining a mapping of voice actions to the plurality of third-party service providers, the mapping indicating that the selected third-party service provider can perform the selected voice action, wherein the selected third-party service provider is configured to perform multiple voice actions, and wherein the selected third-party service provider requires different combinations of input data to perform authentication for at least some of the multiple voice actions;
identifying, by the voice action server, one or more input authentication data types that the selected third-party service provider uses to perform authentication for the selected voice action, wherein the identified one or more input authentication data types for the selected action are different from one or more input authentication data types that the selected third-party service provider uses to perform authentication for at least one other voice action;
obtaining, by the voice action server without requiring the speaker to provide explicit authentication information, one or more authentication data values representing contextual signals from the received contextual data that correspond to the identified one or more input authentication data types; and
providing, to the third-party service provider by the voice action server over a network, (i) a request to perform the selected voice action and (ii) a speaker identification result determined based on the audio data representing the voice command, and (iii) the obtained one or more authentication data values from the received contextual data, wherein the speaker identification result and the one or more obtained authentication data values enable the selected third-party service provider to authenticate the speaker and perform the selected voice action.
2 Assignments
0 Petitions
Accused Products
Abstract
In some implementations, (i) audio data representing a voice command spoken by a speaker and (ii) a speaker identification result indicating that the voice command was spoken by the speaker are obtained. A voice action is selected based at least on a transcription of the audio data. A service provider corresponding to the selected voice action is selected from among a plurality of different service providers. One or more input data types that the selected service provider uses to perform authentication for the selected voice action are identified. A request to perform the selected voice action and (i) one or more values that correspond to the identified one or more input data types are provided to the service provider.
37 Citations
23 Claims
-
1. A method performed by a voice action server, the method comprising:
-
receiving, by the voice action server, (i) audio data representing a voice command spoken by a speaker and (ii) contextual data from a client device of the speaker, the contextual data indicating a status of the client device and comprising data values representing contextual signals that can authenticate the speaker without requiring the speaker to provide explicit authentication information; identifying, by the voice action server, the speaker based on the audio data representing the voice command; selecting, by the voice action server, a voice action based at least on a transcription of the audio data; selecting, by the voice action server, a third-party service provider from among a plurality of different third-party service providers, wherein the third-party service provider is selected by obtaining a mapping of voice actions to the plurality of third-party service providers, the mapping indicating that the selected third-party service provider can perform the selected voice action, wherein the selected third-party service provider is configured to perform multiple voice actions, and wherein the selected third-party service provider requires different combinations of input data to perform authentication for at least some of the multiple voice actions; identifying, by the voice action server, one or more input authentication data types that the selected third-party service provider uses to perform authentication for the selected voice action, wherein the identified one or more input authentication data types for the selected action are different from one or more input authentication data types that the selected third-party service provider uses to perform authentication for at least one other voice action; obtaining, by the voice action server without requiring the speaker to provide explicit authentication information, one or more authentication data values representing contextual signals from the received contextual data that correspond to the identified one or more input authentication data types; and providing, to the third-party service provider by the voice action server over a network, (i) a request to perform the selected voice action and (ii) a speaker identification result determined based on the audio data representing the voice command, and (iii) the obtained one or more authentication data values from the received contextual data, wherein the speaker identification result and the one or more obtained authentication data values enable the selected third-party service provider to authenticate the speaker and perform the selected voice action. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system comprising:
-
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause a voice action server to perform operations comprising; receiving (i) audio data representing a voice command spoken by a speaker and (ii) contextual data from a client device of the speaker, the contextual data indicating a status of the client device and providing data values representing contextual signals that can authenticate the speaker without requiring the speaker to provide explicit authentication information; identifying, the speaker from the audio data representing the voice command; selecting a voice action based at least on a transcription of the audio data; selecting a third-party service provider from among a plurality of different third-party service providers, wherein the third-party service provider is selected by obtaining a mapping of voice actions to the plurality of third-party service providers, the mapping indicating that the selected third-party service provider can perform the selected voice action, the selected third-party service provider is configured to perform multiple voice actions, and wherein the selected third-party service provider requires different combinations of input data to perform authentication for at least some of the multiple voice actions; identifying one or more input authentication data types that the selected third-party service provider uses to perform authentication for the selected voice action, wherein the identified one or more input authentication data types for the selected action are different from one or more input data types that the selected third-party service provider uses to perform authentication for at least one other voice action; obtaining, without requiring the speaker to provide explicit authentication information, one or more authentication data values representing contextual signals from the received contextual data that correspond to the identified one or more input authentication data types; and providing, to the third-party service provider over a network, (i) a request to perform the selected voice action and (ii) a speaker identification result determined based on the audio data representing the voice command, and (iii) the obtained one or more authentication data values from the received contextual data, wherein the speaker identification result and the one or more obtained authentication data values enable the selected third-party service provider to authenticate the speaker and perform the selected voice action. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A non-transitory computer-readable storage medium storing a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
-
receiving (i) audio data representing a voice command spoken by a speaker and (ii) contextual data from a client device of the speaker, the contextual data indicating a status of the client device and providing data values representing contextual signals that can authenticate the speaker without requiring the speaker to provide explicit authentication information; identifying the speaker from the audio data representing the voice command; selecting a voice action based at least on a transcription of the audio data; selecting a third-party service provider from among a plurality of different third-party service providers, wherein the third-party service provider is selected by obtaining a mapping of voice actions to the plurality of third-party service providers, the mapping indicating that the selected third-party service provider can perform the selected voice action, the selected third-party service provider is configured to perform multiple voice actions, and wherein the selected third-party service provider requires different combinations of input data to perform authentication for at least some of the multiple voice actions; identifying one or more input authentication data types that the selected third-party service provider uses to perform authentication for the selected voice action, wherein the identified one or more input authentication data types for the selected action are different from one or more input authentication data types that the selected third-party service provider uses to perform authentication for at least one other voice actions; obtaining, without requiring the speaker to provide explicit authentication information, one or more data values from the received contextual data that correspond to the identified one or more input authentication data types; and providing, to the third-party service provider over a network, (i) a request to perform the selected voice action and (ii) a speaker identification result determined based on the audio data representing the voice command, and (iii) the obtained one or more authentication data values from the received contextual data, wherein the speaker identification result and the one or more obtained authentication data values enable the selected third-party service provider to authenticate the speaker and perform the selected voice action. - View Dependent Claims (23)
-
Specification