System and method for an integrated, multi-modal, multi-device natural language voice services environment

US 9,711,143 B2
Filed: 04/04/2016
Issued: 07/18/2017
Est. Priority Date: 05/27/2008
Status: Active Grant

First Claim

Patent Images

1. A method of providing an integrated multi-modal, natural language voice services environment comprising one or more of an input device that receives a multi-modal natural language input comprising at least a natural language utterance and a non-voice input related to the natural language utterance, a first device, or one or more secondary devices, the method being implemented in the first device having one or more physical processors programmed with computer program instructions that, when executed by the one or more physical processors, program the first device to perform the method, wherein the one or more secondary devices include at least a second device, the method comprising:

obtaining, by the first device from the input device, the multi-modal natural language input;

transcribing, by the first device, the natural language utterance;

determining, by the first device, a preliminary intent prediction of the multi-modal natural language input based on the transcribed utterance and the non-voice input;

transmitting, by the first device, the multi-modal natural language input to the second device;

receiving, by the first device from the second device, a second intent prediction of the multi-modal natural language input;

determining, by the first device, an intent of the multi-modal natural language input based on the preliminary intent prediction and the second intent prediction; and

invoking, by the first device, at least one action at one or more of the input device, the first device, or the one or more secondary devices based on the determined intent.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for an integrated, multi-modal, multi-device natural language voice services environment may be provided. In particular, the environment may include a plurality of voice-enabled devices each having intent determination capabilities for processing multi-modal natural language inputs in addition to knowledge of the intent determination capabilities of other devices in the environment. Further, the environment may be arranged in a centralized manner, a distributed peer-to-peer manner, or various combinations thereof. As such, the various devices may cooperate to determine intent of multi-modal natural language inputs, and commands, queries, or other requests may be routed to one or more of the devices best suited to take action in response thereto.

885 Citations

18 Claims

1. A method of providing an integrated multi-modal, natural language voice services environment comprising one or more of an input device that receives a multi-modal natural language input comprising at least a natural language utterance and a non-voice input related to the natural language utterance, a first device, or one or more secondary devices, the method being implemented in the first device having one or more physical processors programmed with computer program instructions that, when executed by the one or more physical processors, program the first device to perform the method, wherein the one or more secondary devices include at least a second device, the method comprising:
- obtaining, by the first device from the input device, the multi-modal natural language input;
  
  transcribing, by the first device, the natural language utterance;
  
  determining, by the first device, a preliminary intent prediction of the multi-modal natural language input based on the transcribed utterance and the non-voice input;
  
  transmitting, by the first device, the multi-modal natural language input to the second device;
  
  receiving, by the first device from the second device, a second intent prediction of the multi-modal natural language input;
  
  determining, by the first device, an intent of the multi-modal natural language input based on the preliminary intent prediction and the second intent prediction; and
  
  invoking, by the first device, at least one action at one or more of the input device, the first device, or the one or more secondary devices based on the determined intent.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein invoking the at least one action at one or more of the input device, the first device, or the one or more secondary devices comprises transmitting a request related to the multi-modal natural language input based on the preliminary intent prediction.
  - 3. The method of claim 1, the method further comprising:
    - determining, by the first device, processing capabilities associated with the one or more secondary devices; and
      
      selecting, by the first device, based on the processing capabilities associated with the one or more secondary devices, the second device to make the second intent prediction of the multi-modal natural language input.
  - 4. The method of claim 3, the method further comprising:
    - maintaining, by the first device, a constellation model that describes natural language resources, dynamic states, and intent determination capabilities associated with the input device and the one or more secondary devices, wherein the processing capabilities associated with the one or more secondary devices are determined based on the constellation model.
  - 5. The method of claim 4, wherein the intent determination capabilities for a given one of the input device, the first device, or the one or more secondary devices are based on at least one of processing power, storage resources, natural language processing capabilities, or local knowledge.
  - 6. The method of claim 1, the method further comprising:
    - determining, by the first device, a domain relating to the multi-modal natural language input; and
      
      selecting, by the first device, based on the domain, the second device to make the second intent prediction of the multi-modal natural language input.
  - 7. The method of claim 6, wherein the one or more secondary devices are associated with different domains, the second device is associated with the domain, and the different domains comprise the domain.
  - 8. The method of claim 1, wherein the input device initially received the multi-modal natural language input.

9. A method of providing an integrated multi-modal, natural language voice services environment comprising one or more of an input device that receives a multi-modal natural language input comprising at least a natural language utterance and a non-voice input related to the natural language utterance, a first device, or one or more secondary devices, the method being implemented in the first device having one or more physical processors programmed with computer program instructions that, when executed by the one or more physical processors, program the first device to perform the method, the method comprising:
- obtaining, by the first device from the input device, the multi-modal natural language input;
  
  transcribing, by the first device, the natural language utterance;
  
  determining, by the first device, a preliminary intent prediction of the multi-modal natural language input based on the transcribed utterance and the non-voice input;
  
  communicating, by the first device, the multi-modal natural language input to each of the one or more secondary devices, wherein each of the one or more secondary devices determines an intent of the multi-modal natural language input received at the input device using local intent determination capabilities;
  
  receiving, by the first device, an intent determination from each of the secondary devices; and
  
  arbitrating, by the first device, among the intent determinations received from each of the secondary devices to determine an intent of the multi-modal natural input; and
  
  invoking, by the first device, at least one action at one or more of the input device, the first device, or the one or more secondary devices based on the determined intent.

10. A system for processing a multi-modal natural language input, the system comprising:
- an input device that receives a multi-modal natural language input comprising at least a natural language utterance and a non-voice input related to the natural language utterance;
  
  one or more secondary devices, wherein the one or more secondary devices include at least a second device, anda first device having one or more physical processors programmed with computer program instructions that, when executed by the one or more physical processors, program the first device to;
  
  obtain, from the input device, the multi-modal natural language input;
  
  transcribe the natural language utterance;
  
  determine a preliminary intent prediction of the multi-modal natural language input based on the transcribed utterance and the non-voice input; and
  
  transmit the multi-modal natural language input to the second device;
  
  receive, from the second device, a second intent prediction of the multi-modal natural language input;
  
  determine an intent of the multi-modal natural language input based on the preliminary intent prediction and the second intent prediction; and
  
  invoke at least one action at one or more of the input device, the first device, or the one or more secondary devices based on the determined intent.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The system of claim 10, wherein to invoke the at least one action at one or more of the input device, the first device, or the one or more secondary devices, the first device is further programmed to:
    - transmit a request related to the multi-modal natural language input based on the preliminary intent prediction.
  - 12. The system of claim 10, wherein the first device is further programmed to:
    - determine processing capabilities associated with the one or more secondary devices; and
      
      select based on the processing capabilities associated with the one or more secondary devices, the second device to make the second intent prediction of the multi-modal natural language input.
  - 13. The system of claim 12, wherein the first device is further programmed to:
    - maintain a constellation model that describes natural language resources, dynamic states, and intent determination capabilities associated with the input device and the one or more secondary devices, wherein the processing capabilities associated with the one or more secondary devices are determined based on the constellation model.
  - 14. The system of claim 13, wherein the intent determination capabilities for a given one of the input device, the first device, or the one or more secondary devices are based on at least one of processing power, storage resources, natural language processing capabilities, or local knowledge.
  - 15. The system of claim 10, wherein the first device is further programmed to:
    - determine a domain relating to the multi-modal natural language input; and
      
      select, based on the domain, the second device to make the second intent prediction of the multi-modal natural language input.
  - 16. The system of claim 15, wherein the one or more secondary devices are associated with different domains, the second device is associated with the domain, and the different domains comprise the domain.
  - 17. The system of claim 10, wherein the input device initially received the multi-modal natural language input.

18. A system for processing a multi-modal natural language input, the system comprising:
- an input device that receives a multi-modal natural language input comprising at least a natural language utterance and a non-voice input related to the natural language utterance;
  
  one or more secondary devices; and
  
  a first device having one or more physical processors programmed with computer program instructions that, when executed by the one or more physical processors, program the first device to;
  
  obtain, from the input device, the multi-modal natural language input;
  
  transcribe the natural language utterance;
  
  determine a preliminary intent prediction of the multi-modal natural language input based on the transcribed utterance and the non-voice input;
  
  communicate the multi-modal natural language input to each of the one or more secondary devices, wherein each of the one or more secondary devices determines an intent of the multi-modal natural language input received at the input device using local intent determination capabilities;
  
  receive an intent determination from each of the secondary devices; and
  
  arbitrate among the intent determinations received from each of the secondary devices to determine an intent of the multi-modal natural input,invoke at least one action at one or more of the input device, the first device, or the one or more secondary devices based on the determined intent.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle International Corporation (Oracle Corporation)
Original Assignee
VoiceBox Technologies Corporation (Microsoft Corporation)
Inventors
Kennewick, Robert A., Weider, Chris
Primary Examiner(s)
PULLIAS, JESSE SCOTT

Application Number

US15/090,215
Publication Number

US 20160217785A1
Time in Patent Office

470 Days
Field of Search

704275
US Class Current
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G06F 40/40   Processing or translation o...

G10L 15/1822   Parsing for meaning underst...

G10L 15/22   Procedures used during a sp...

G10L 15/24   Speech recognition using no...

G10L 15/26   Speech to text systems G10L...

G10L 15/285   Memory allocation or algori...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

G10L 2015/223   Execution procedure of a sp...

System and method for an integrated, multi-modal, multi-device natural language voice services environment

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

885 Citations

18 Claims

Specification

Use Cases

Quick Links

Others

System and method for an integrated, multi-modal, multi-device natural language voice services environment

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

885 Citations

18 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others