System and method for an integrated, multi-modal, multi-device natural language voice services environment
First Claim
1. A method of providing an integrated multi-modal, natural language voice services environment comprising one or more of an input device that receives a multi-modal natural language input comprising at least a natural language utterance and a non-voice input related to the natural language utterance, a first device, or one or more secondary devices, the method being implemented in the first device having one or more physical processors programmed with computer program instructions that, when executed by the one or more physical processors, program the first device to perform the method, the method comprising:
- obtaining, by the first device from the input device, the multi-modal natural language input;
transcribing, by the first device, the natural language utterance;
determining, by the first device, a preliminary intent prediction of the multi-modal natural language input based on the transcribed utterance and the non-voice input; and
invoking, by the first device, at least one action at one or more of the input device, the first device, or the one or more secondary devices based on the preliminary intent prediction.
10 Assignments
0 Petitions
Accused Products
Abstract
A system and method for an integrated, multi-modal, multi-device natural language voice services environment may be provided. In particular, the environment may include a plurality of voice-enabled devices each having intent determination capabilities for processing multi-modal natural language inputs in addition to knowledge of the intent determination capabilities of other devices in the environment. Further, the environment may be arranged in a centralized manner, a distributed peer-to-peer manner, or various combinations thereof. As such, the various devices may cooperate to determine intent of multi-modal natural language inputs, and commands, queries, or other requests may be routed to one or more of the devices best suited to take action in response thereto.
-
Citations
20 Claims
-
1. A method of providing an integrated multi-modal, natural language voice services environment comprising one or more of an input device that receives a multi-modal natural language input comprising at least a natural language utterance and a non-voice input related to the natural language utterance, a first device, or one or more secondary devices, the method being implemented in the first device having one or more physical processors programmed with computer program instructions that, when executed by the one or more physical processors, program the first device to perform the method, the method comprising:
-
obtaining, by the first device from the input device, the multi-modal natural language input; transcribing, by the first device, the natural language utterance; determining, by the first device, a preliminary intent prediction of the multi-modal natural language input based on the transcribed utterance and the non-voice input; and invoking, by the first device, at least one action at one or more of the input device, the first device, or the one or more secondary devices based on the preliminary intent prediction. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for processing a multi-modal natural language input, the system comprising:
-
an input device that receives a multi-modal natural language input comprising at least a natural language utterance and a non-voice input related to the natural language utterance;
one or more secondary devices; anda first device having one or more physical processors programmed with computer program instructions that, when executed by the one or more physical processors, program the first device to; obtain, from the input device, the multi-modal natural language input;
transcribe the natural language utterance;determine a preliminary intent prediction of the multi-modal natural language input based on the transcribed utterance and the non-voice input; and invoke at least one action at one or more of the input device, the first device, or the one or more secondary devices based on the preliminary intent prediction. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification