System and method for processing multi-modal device interactions in a natural language voice services environment

US 10,553,213 B2
Filed: 04/19/2018
Issued: 02/04/2020
Est. Priority Date: 02/20/2009
Status: Active Grant

First Claim

Patent Images

1. A method for processing one or more multi-modal user interactions in a natural language voice services environment that includes one or more electronic devices, the method being implemented by a computer system that includes one or more physical processors executing one or more computer program instructions which, when executed, perform the method, the method comprising:

detecting a multi-modal user interaction received via one or more electronic devices, the multi-modal user interaction comprising at least a non-voice input and a natural language utterance, wherein the non-voice input is received from a non-voice input component of the one or more electronic devices, and wherein the natural language utterance is received from a voice input component of the one or more electronic devices and is related to the non-voice input;

obtaining an indication of a first time at which the non-voice input was received by the non-voice input component;

obtaining an indication of a second time at which the natural language utterance was received by the voice input component;

determining that the non-voice input and the natural language utterance are related and are to be interpreted together based on the first time and the second time; and

responsive to determining that the non-voice input and the natural language utterance are related and are to be interpreted together based on the first time and the second time, performing the following steps;

determining first context information relating to the non-voice input;

determining second context information relating to the natural language utterance;

determining an intent of the multi-modal user interaction based on the first context information and the second context information;

identifying a transaction lead based on the determined intent; and

transmitting the identified transaction lead to a user via the one or more electronic devices.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for processing multi-modal device interactions in a natural language voice services environment may be provided. In particular, one or more multi-modal device interactions may be received in a natural language voice services environment that includes one or more electronic devices. The multi-modal device interactions may include a non-voice interaction with at least one of the electronic devices or an application associated therewith, and may further include a natural language utterance relating to the non-voice interaction. Context relating to the non-voice interaction and the natural language utterance may be extracted and combined to determine an intent of the multi-modal device interaction, and a request may then be routed to one or more of the electronic devices based on the determined intent of the multi-modal device interaction.

935 Citations

30 Claims

1. A method for processing one or more multi-modal user interactions in a natural language voice services environment that includes one or more electronic devices, the method being implemented by a computer system that includes one or more physical processors executing one or more computer program instructions which, when executed, perform the method, the method comprising:
- detecting a multi-modal user interaction received via one or more electronic devices, the multi-modal user interaction comprising at least a non-voice input and a natural language utterance, wherein the non-voice input is received from a non-voice input component of the one or more electronic devices, and wherein the natural language utterance is received from a voice input component of the one or more electronic devices and is related to the non-voice input;
  
  obtaining an indication of a first time at which the non-voice input was received by the non-voice input component;
  
  obtaining an indication of a second time at which the natural language utterance was received by the voice input component;
  
  determining that the non-voice input and the natural language utterance are related and are to be interpreted together based on the first time and the second time; and
  
  responsive to determining that the non-voice input and the natural language utterance are related and are to be interpreted together based on the first time and the second time, performing the following steps;
  
  determining first context information relating to the non-voice input;
  
  determining second context information relating to the natural language utterance;
  
  determining an intent of the multi-modal user interaction based on the first context information and the second context information;
  
  identifying a transaction lead based on the determined intent; and
  
  transmitting the identified transaction lead to a user via the one or more electronic devices.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method of claim 1, wherein the one or more processors, the non-voice input component, and the voice input component are housed within a single electronic device.
  - 3. The method of claim 1, wherein the one or more processors are housed in a first electronic device, the non-voice input component is housed in a second electronic device, and the voice input component is housed in a third electronic device.
  - 4. The method of claim 1, wherein the one or more processors are housed in a first electronic device, and wherein the non-voice input component and the voice input component are housed in a second electronic device.
  - 5. The method of claim 1, wherein the one or more processors and the non-voice input component are housed in a first electronic device, and wherein the voice input component is housed in a second electronic device.
  - 6. The method of claim 1, wherein the one or more processors and the voice input component are housed in a first electronic device, and wherein the non-voice input component is housed in a second electronic device.
  - 7. The method of claim 1, wherein the non-voice input comprises a point of focus input on a display of the non-voice input component.
  - 8. The method of claim 1, wherein the non-voice input comprises a highlighting of text on a display of the non-voice input component.
  - 9. The method of claim 1, the method further comprising:
    - obtaining preference information of a user, wherein the transaction lead is identified based further on the preference information.
  - 10. The method of claim 1, wherein the transaction lead comprises at least one of an advertisement or a recommendation related to the determined intent of the multi-modal user interaction.
  - 11. The method of claim 1, the method further comprising:
    - receiving a further input after the transaction lead was transmitted;
      
      determining a second intent of the further input; and
      
      providing further information relating to the transaction lead based on the second intent.
  - 12. The method of claim 1, the method further comprising:
    - receiving a further input after the transaction lead was transmitted;
      
      determining a second intent of the further input; and
      
      completing a purchase transaction in response to receiving the further input based on the determined second intent.
  - 13. The method of claim 12, wherein the further input comprises a second natural language utterance.
  - 14. The method of claim 12, wherein the further input comprises a second non-voice input.
  - 15. The method of claim 1, wherein the non-voice input component comprises a map display, and wherein the transaction lead is presented as a point on the map display.

16. A system of processing one or more multi-modal user interactions in a natural language voice services environment that includes one or more electronic devices, the system comprising:
- one or more physical processors programmed with one or more computer program instructions which, when executed, cause the one or more physical processors to;
  
  detect a multi-modal user interaction received via one or more electronic devices, the multi-modal user interaction comprising at least a non-voice input and a natural language utterance, wherein the non-voice input is received from a non-voice input component of the one or more electronic devices, and wherein the natural language utterance is received from a voice input component of the one or more electronic devices and is related to the non-voice input;
  
  obtain an indication of a first time at which the non-voice input was received by the non-voice input component;
  
  obtain an indication of a second time at which the natural language utterance was received by the voice input component;
  
  determine that the non-voice input and the natural language utterance are related and are to be interpreted together based on the first time and the second time; and
  
  responsive to determining that the non-voice input and the natural language utterance are related and are to be interpreted together based on the first time and the second time, perform the following steps;
  
  determine first context information relating to the non-voice input;
  
  determine second context information relating to the natural language utterance;
  
  determine an intent of the multi-modal user interaction based on the first context information and the second context information;
  
  identify a transaction lead based on the determined intent; and
  
  transmit the identified transaction lead to a user via the one or more electronic devices.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
- - 17. The system of claim 16, wherein the one or more processors, the non-voice input component, and the voice input component are housed within a single electronic device.
  - 18. The system of claim 16, wherein the one or more processors are housed in a first electronic device, the non-voice input component is housed in a second electronic device, and the voice input component is housed in a third electronic device.
  - 19. The system of claim 16, wherein the one or more processors are housed in a first electronic device, and wherein the non-voice input component and the voice input component are housed in a second electronic device.
  - 20. The system of claim 16, wherein the one or more processors and the non-voice input component are housed in a first electronic device, and wherein the voice input component is housed in a second electronic device.
  - 21. The system of claim 16, wherein the one or more processors and the voice input component are housed in a first electronic device, and wherein the non-voice input component is housed in a second electronic device.
  - 22. The system of claim 16, wherein the non-voice input comprises a point of focus input on a display of the non-voice input component.
  - 23. The system of claim 16, wherein the non-voice input comprises a highlighting of text on a display of the non-voice input component.
  - 24. The system of claim 16, wherein the one or more physical processors are further programmed to:
    - obtain preference information of a user, wherein the transaction lead is identified based further on the preference information.
  - 25. The system of claim 16, wherein the transaction lead comprises at least one of an advertisement or a recommendation related to the determined intent of the multi-modal user interaction.
  - 26. The system of claim 16, wherein the one or more physical processors are further programmed to:
    - receive a further input after the transaction lead was transmitted;
      
      determine a second intent of the further input; and
      
      provide further information relating to the transaction lead based on the second intent.
  - 27. The system of claim 16, wherein the one or more physical processors are further programmed to:
    - receive a further input after the transaction lead was transmitted;
      
      determine a second intent of the further input; and
      
      complete a purchase transaction in response to receiving the further input based on the determined second intent.
  - 28. The system of claim 27, wherein the further input comprises a second natural language utterance.
  - 29. The system of claim 27, wherein the further input comprises a second non-voice input.
  - 30. The system of claim 16, wherein the non-voice input component comprises a map display, and wherein the transaction lead is presented as a point on the map display.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle International Corporation (Oracle Corporation)
Original Assignee
Oracle International Corporation (Oracle Corporation)
Inventors
Baldwin, Larry, Weider, Chris
Primary Examiner(s)
Vo, Huyen X

Application Number

US15/957,158
Publication Number

US 20180308479A1
Time in Patent Office

656 Days
Field of Search

704 1- 10, 704230-257, 704270-277
US Class Current
CPC Class Codes

G06Q 30/02   Marketing; Price estimation...

G06Q 30/0241   Advertisements

G06Q 30/0261   based on user location

G06Q 30/0273   Determination of fees for a...

G10L 15/18   using natural language mode...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/22   Procedures used during a sp...

G10L 15/24   Speech recognition using no...

G10L 17/22   Interactive procedures; Man...

G10L 2015/223   Execution procedure of a sp...

G10L 2015/227   of the speaker; Human-fact...

System and method for processing multi-modal device interactions in a natural language voice services environment

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

935 Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for processing multi-modal device interactions in a natural language voice services environment

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

935 Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links