System and method for processing multi-modal device interactions in a natural language voice services environment

US 8,326,637 B2
Filed: 02/20/2009
Issued: 12/04/2012
Est. Priority Date: 02/20/2009
Status: Active Grant

First Claim

Patent Images

1. A method for processing one or more multi-modal inputs in a natural language voice services environment that includes one or more electronic devices, comprising:

detecting at one or more electronic device, a multi-modal input that includes a first input having a first modality type and a second input having a second modality type, wherein the second input is related to the first input, and wherein the first modality type is different than the second modality type;

extracting, at a processor, context information relating to the multi-modal input, from the first input and from the second inputdetermining, at the processor, a request from the first input or the second input;

processing, at the processor, the request based on the extracted context information relating to the multi-modal input;

generating at least one transaction lead based on the extracted context information of the multi-modal input;

receiving at least one further input relating to the generated at least one transaction lead; and

processing a transaction click-through in response to receiving the at least one further input.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for processing multi-modal device interactions in a natural language voice services environment may be provided. In particular, one or more multi-modal device interactions may be received in a natural language voice services environment that includes one or more electronic devices. The multi-modal device interactions may include a non-voice interaction with at least one of the electronic devices or an application associated therewith, and may further include a natural language utterance relating to the non-voice interaction. Context relating to the non-voice interaction and the natural language utterance may be extracted and combined to determine an intent of the multi-modal device interaction, and a request may then be routed to one or more of the electronic devices based on the determined intent of the multi-modal device interaction.

561 Citations

26 Claims

1. A method for processing one or more multi-modal inputs in a natural language voice services environment that includes one or more electronic devices, comprising:
- detecting at one or more electronic device, a multi-modal input that includes a first input having a first modality type and a second input having a second modality type, wherein the second input is related to the first input, and wherein the first modality type is different than the second modality type;
  
  extracting, at a processor, context information relating to the multi-modal input, from the first input and from the second inputdetermining, at the processor, a request from the first input or the second input;
  
  processing, at the processor, the request based on the extracted context information relating to the multi-modal input;
  
  generating at least one transaction lead based on the extracted context information of the multi-modal input;
  
  receiving at least one further input relating to the generated at least one transaction lead; and
  
  processing a transaction click-through in response to receiving the at least one further input.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. The method of claim 1, wherein the first input is a non-voice input and the second input is a natural language utterance, and wherein at least one of the one or more electronic device includes an input device configured to receive the natural language utterance.
  - 3. The method of claim 2, wherein detecting the at least one multi-modal input comprises causing, in response to the non-voice input being detected, the input device to capture the natural language utterance.
  - 4. The method of claim 3, further comprising:
    - synchronizing information relating to the non-voice input and the natural language utterance captured by the input device.
  - 5. The method of claim 3, wherein the non-voice input comprises a non-voice input portion having a pre-established association with detection of multi-modal inputs in a natural language voice services environment.
  - 6. The method of claim 5, wherein the non-voice input portion having the pre-established association with detection of multi-modal inputs in a natural language voice services environment comprises a touch gesture or a button press.
  - 7. The method of claim 2, wherein the non-voice input comprises a selection of a segment, item, data, application, point of focus, or attention focus associated with one or more of the electronic devices.
  - 8. The method of claim 7, wherein the context information relating to the request is based on the segment, item, data, application, point of focus, or attention focus selected by the non-voice input.
  - 9. The method of claim 2, wherein the non-voice input comprises an identification of a point of focus or an attention focus associated with one or more of the electronic devices.
  - 10. The method of claim 2, wherein determining the request comprises determining which one of an action, query, command, or task is being requested, and wherein extracting the context information comprises extracting, based on the non-voice input, a parameter of the action, query, command, or task.
  - 11. The method of claim 10, wherein the parameter comprises a location or topic related to the action, query, command, or task.
  - 12. The method of claim 10, wherein extracting the context information comprises extracting, based on the natural language utterance, a domain of the action, query, command, or task.
  - 13. The method of claim 12, wherein the domain is a navigation, entertainment, weather, shopping, news, language, or dining domain.
  - 14. The method of claim 2, wherein the natural language utterance is a first natural language utterance, wherein extracting the context information is further based on a second natural language utterance, wherein the second natural language utterance is detected prior to or subsequent to the first natural language utterance.
  - 15. The method of claim 2, wherein detecting the at least one multi-modal input comprises causing, in response to a pre-established voice-based word or phrase being recognized, the input device to capture the natural language utterance.
  - 16. The method of claim 2, wherein detecting the at least one multi-modal input comprises capturing the non-voice input after capturing the natural language utterance.
  - 17. The method of claim 1, wherein the generated transaction lead includes at least one of an advertisement or a recommendation relating to the extracted context information relating to the multi-modal input.
  - 18. The method of claim 1, wherein processing the request comprises routing the request to the one or more electronic devices based on the extracted context information relating to the multi-modal input.

19. A system for processing one or more multi-modal inputs in a natural language voice services environment that includes one or more electronic devices, wherein the system comprises a processing device configured to:
- receive from one or more electronic devices a multi-modal input that includes a first input having a first modality type and a second input having a second modality type, wherein the second input is related to the first input, and wherein the first modality type is different than the second modality type;
  
  extract context information relating to the multi-modal input, from the first input and from the second input;
  
  determine a request based on the non-voice input or the natural language utterance;
  
  process the request based on the extracted context information relating to the multi-modal input; and
  
  generate at least one transaction lead based on the extracted context information of the multi-modal input;
  
  receive at least one further input relating to the generated at least one transaction lead; and
  
  process a transaction click-through in response to receiving the at least one further input.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26)
- - 20. The system of claim 19, wherein the first input is a non-voice input and the second input is a natural language utterance, and wherein at least one of the one or more electronic devices includes an input device configured to receive the natural language utterance.
  - 21. The system of claim 20, the processing devices further configured to detect the at least one multi-modal input by causing, in response to the non-voice input being detected, the input device to capture the natural language utterance.
  - 22. The system of claim 21, the processing devices further configured to:
    - synchronize information relating to the non-voice input and the natural language utterance captured by the input device.
  - 23. The system of claim 21, wherein the non-voice input comprises a non-voice input portion having a pre-established association with detection of multi-modal inputs in a natural language voice services environment.
  - 24. The system of claim 20, wherein the non-voice input comprises a selection of a segment, item, data, or application associated with one or more electronic devices.
  - 25. The system of claim 20, wherein the non-voice input comprises an identification of a point of focus or an attention focus associated with one or more of the electronic devices.
  - 26. The system of claim 19, wherein the generated transaction lead includes at least one of an advertisement or a recommendation relating to the extracted context information relating to the multi-modal input.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle International Corporation (Oracle Corporation)
Original Assignee
VoiceBox Technologies, Inc. (Microsoft Corporation)
Inventors
Baldwin, Larry, Weider, Chris
Primary Examiner(s)
Vo, Huyen X.

Application Number

US12/389,678
Publication Number

US 20100217604A1
Time in Patent Office

1,383 Days
Field of Search

704/231, 704/235, 704/251, 704/255, 704/257, 704/275, 704 1- 10, 704/270, 382/187
US Class Current

704/275
CPC Class Codes

G06Q 30/02   Marketing; Price estimation...

G06Q 30/0241   Advertisements

G06Q 30/0261   based on user location

G06Q 30/0273   Determination of fees for a...

G10L 15/18   using natural language mode...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/22   Procedures used during a sp...

G10L 15/24   Speech recognition using no...

G10L 17/22   Interactive procedures; Man...

G10L 2015/223   Execution procedure of a sp...

G10L 2015/227   of the speaker; Human-fact...

System and method for processing multi-modal device interactions in a natural language voice services environment

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

561 Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for processing multi-modal device interactions in a natural language voice services environment

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

561 Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links