System and method for processing multi-modal device interactions in a natural language voice services environment
First Claim
1. A method for processing one or more multi-modal device interactions in a natural language voice services environment that includes a plurality of electronic devices each separate from one another, the plurality of electronic devices including a first electronic device having at least a non-voice input device and a second electronic device having at least a voice input device, the method comprising:
- receiving, by a voice-click component of at least one of the plurality of electronic devices, a non-voice interaction detected at the first electronic device and a natural language utterance detected at the second electronic device;
determining, by the voice-click component, first context information relating to the non-voice interaction, wherein the first context information includes context relating to the non-voice interaction;
determining, by the voice-click component, second context information relating to the natural language utterance, wherein the second context information includes context relating to the natural language utterance;
determining, by the voice-click component, an intent based on the first context relating to the non-voice interaction and the second context relating to the natural language utterance;
generating, by the voice-click component, a request based on the determined intent; and
transmitting, by the voice-click component, the request to a target electronic device from among the plurality of electronic devices.
10 Assignments
0 Petitions
Accused Products
Abstract
A system and method for processing multi-modal device interactions in a natural language voice services environment may be provided. In particular, one or more multi-modal device interactions may be received in a natural language voice services environment that includes one or more electronic devices. The multi-modal device interactions may include a non-voice interaction with at least one of the electronic devices or an application associated therewith, and may further include a natural language utterance relating to the non-voice interaction. Context relating to the non-voice interaction and the natural language utterance may be extracted and combined to determine an intent of the multi-modal device interaction, and a request may then be routed to one or more of the electronic devices based on the determined intent of the multi-modal device interaction.
850 Citations
21 Claims
-
1. A method for processing one or more multi-modal device interactions in a natural language voice services environment that includes a plurality of electronic devices each separate from one another, the plurality of electronic devices including a first electronic device having at least a non-voice input device and a second electronic device having at least a voice input device, the method comprising:
-
receiving, by a voice-click component of at least one of the plurality of electronic devices, a non-voice interaction detected at the first electronic device and a natural language utterance detected at the second electronic device; determining, by the voice-click component, first context information relating to the non-voice interaction, wherein the first context information includes context relating to the non-voice interaction; determining, by the voice-click component, second context information relating to the natural language utterance, wherein the second context information includes context relating to the natural language utterance; determining, by the voice-click component, an intent based on the first context relating to the non-voice interaction and the second context relating to the natural language utterance; generating, by the voice-click component, a request based on the determined intent; and transmitting, by the voice-click component, the request to a target electronic device from among the plurality of electronic devices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system for processing one or more multi-modal device interactions in a natural language voice services environment that includes a plurality of electronic devices each separate from one another, the plurality of electronic devices including a first electronic device having at least a non-voice input device and a second electronic device having at least a voice input device, the system:
-
a computer system comprising one or more physical processors implementing a voice-click component to; receive a non-voice interaction detected at the first electronic device and a natural language utterance detected at the second electronic device; determine first context information relating to the non-voice interaction, wherein the first context information includes context relating to the non-voice interaction; determine second context information relating to the natural language utterance, wherein the second context information includes context relating to the natural language utterance; determine an intent based on the first context relating to the non-voice interaction and the second context relating to the natural language utterance; generate a request based on the determined intent; and transmit the request to a target electronic device from among the plurality of electronic devices. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
-
Specification