System and method for processing multi-modal device interactions in a natural language voice services environment

US 9,105,266 B2
Filed: 05/15/2014
Issued: 08/11/2015
Est. Priority Date: 02/20/2009
Status: Active Grant

First Claim

Patent Images

1. A method for facilitating natural language processing of user inputs via multiple input modes where each user input alone may be insufficient to completely and/or accurately determine a user request intended by a user, the method being implemented by a computer system that includes one or more physical processors executing computer program instructions which, when executed, perform the method, the method comprising:

receiving, at the computer system, a first user input of a user from a first input device via a first input mode, wherein the first user input is generated responsive to the user interacting with the first input device in a manner corresponding to the first input mode to provide the first user input;

receiving, at the computer system, a second user input of the user from a second input device via a second input mode, wherein the second user input is generated responsive to the user interacting with the second input device in a manner corresponding to the second input mode to provide the second user input, wherein the first user input and the second user input are related to one another, and wherein one of the first user input or the second user input comprises a voice input received from at least one of the first input device or the second input device via a voice input mode, and the other one of the first user input or the second user input comprises a non-voice input received from at least one of the first input device or the second input device via a non-voice input mode;

determining, by the computer system, based on the second user input, context information for interpreting the first user input, wherein the context information identifies a first item of a first item type;

determining, by the computer system, further context information based on the first user input, wherein the further context information identifies a second item of a second item type that is related to the first item of the first item type;

generating, by the computer system, a query based on the context information and the further context information to obtain one or more intermediary results, wherein the generated query comprises a query related to the second item of the second item type;

determining, by the computer system, a user request based on the one or more intermediary results;

providing, by the computer system, a response to the user request; and

providing, by the computer system, based on at least one of the context information for interpreting the first user input or the further context information, an advertisement for presentation to the user.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for processing multi-modal device interactions in a natural language voice services environment may be provided. In particular, one or more multi-modal device interactions may be received in a natural language voice services environment that includes one or more electronic devices. The multi-modal device interactions may include a non-voice interaction with at least one of the electronic devices or an application associated therewith, and may further include a natural language utterance relating to the non-voice interaction. Context relating to the non-voice interaction and the natural language utterance may be extracted and combined to determine an intent of the multi-modal device interaction, and a request may then be routed to one or more of the electronic devices based on the determined intent of the multi-modal device interaction.

709 Citations

20 Claims

1. A method for facilitating natural language processing of user inputs via multiple input modes where each user input alone may be insufficient to completely and/or accurately determine a user request intended by a user, the method being implemented by a computer system that includes one or more physical processors executing computer program instructions which, when executed, perform the method, the method comprising:
- receiving, at the computer system, a first user input of a user from a first input device via a first input mode, wherein the first user input is generated responsive to the user interacting with the first input device in a manner corresponding to the first input mode to provide the first user input;
  
  receiving, at the computer system, a second user input of the user from a second input device via a second input mode, wherein the second user input is generated responsive to the user interacting with the second input device in a manner corresponding to the second input mode to provide the second user input, wherein the first user input and the second user input are related to one another, and wherein one of the first user input or the second user input comprises a voice input received from at least one of the first input device or the second input device via a voice input mode, and the other one of the first user input or the second user input comprises a non-voice input received from at least one of the first input device or the second input device via a non-voice input mode;
  
  determining, by the computer system, based on the second user input, context information for interpreting the first user input, wherein the context information identifies a first item of a first item type;
  
  determining, by the computer system, further context information based on the first user input, wherein the further context information identifies a second item of a second item type that is related to the first item of the first item type;
  
  generating, by the computer system, a query based on the context information and the further context information to obtain one or more intermediary results, wherein the generated query comprises a query related to the second item of the second item type;
  
  determining, by the computer system, a user request based on the one or more intermediary results;
  
  providing, by the computer system, a response to the user request; and
  
  providing, by the computer system, based on at least one of the context information for interpreting the first user input or the further context information, an advertisement for presentation to the user.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, further comprising:
    - providing, by the computer system, the context information for interpreting the first user input to an advertiser system; and
      
      obtaining, by the computer system, the advertisement from the advertiser system responsive to providing the context information to the advertiser system,wherein providing the advertisement comprises providing the advertisement obtained from the advertiser system.
  - 3. The method of claim 1, wherein the context information for interpreting the first user input is used as input for selecting the advertisement, and wherein providing the advertisement comprises providing the selected advertisement.
  - 4. The method of claim 1, wherein information about the user request is used as input for selecting the advertisement, and wherein providing the advertisement comprises providing the selected advertisement.
  - 5. The method of claim 1, wherein the first user input comprises the voice input received via the voice input mode, and the second user input comprises the non-voice input received via the non-voice input mode,wherein determining the context information comprises determining, based on the non-voice input, the context information for interpreting the voice input,wherein determining the user request comprises determining the user request based on the voice input and the context information for interpreting the voice input, andwherein providing the advertisement comprises providing the advertisement based on the context information for interpreting the voice input.
  - 6. The method of claim 5, wherein the context information for interpreting the voice input is used as input for selecting the advertisement, and wherein providing the advertisement comprises providing the selected advertisement.
  - 7. The method of claim 5, further comprising:
    - processing, by the computer system, the voice input to recognize one or more words of the voice input;
      
      interpreting, by the computer system, the one or more recognized words based on the context information determined from the non-voice input for interpreting the voice input,wherein determining the user request comprises determining the user request based on the interpretation of the one or more recognized words.
  - 8. The method of claim 7, wherein at least one of the one or more recognized words is associated with at least two meanings,wherein interpreting the one or more recognized words comprises selecting, based on the context information determined from the non-voice input for interpreting the voice input, one of the at least two meanings associated with the at least one recognized word to determine the user request.
  - 9. The method of claim 1, wherein the first user input comprises the non-voice input received via the non-voice input mode, and the second user input comprises the voice input received via the voice input mode,wherein determining the context information comprises determining, based on the voice input, the context information for interpreting the non-voice input,wherein determining the user request comprises determining the user request based on the non-voice input and the context information for interpreting the non-voice input, andwherein providing the advertisement comprises providing the advertisement based on the context information for interpreting the non-voice input.
  - 10. The method of claim 9, wherein the context information for interpreting the non-voice input is used as input for selecting the advertisement, and wherein providing the advertisement comprises providing the selected advertisement.
  - 11. The method of claim 1, wherein the first item of the first item type comprises one of a command or a music-related product, and the second item of the second item type comprises the other one of the command or the music-related product.
  - 12. The method of claim 1, further comprising:
    - determining, by the computer system, prior context information associated with one or more prior voice inputs, wherein the one or more prior voice inputs are received by the computer system before the voice input is received, andwherein determining the user request comprises determining the user request further based on the prior context information.
  - 13. The method of claim 1, wherein the context information for interpreting the first user input comprises information identifying at least one of a product, a service, a place, a location, an entity, or a content item.
  - 14. The method of claim 1, wherein the receipt of first user input is prior to, contemporaneously with, or subsequent to the receipt of the second user input.

15. A system for facilitating natural language processing of user inputs via multiple input modes where each user input alone may be insufficient to completely and/or accurately determine a user request intended by a user, the system comprising:
- one or more physical processors programmed with computer program instructions which, when executed, cause the one or more physical processors to;
  
  receive a first user input of a user from a first input device via a first input mode, wherein the first user input is generated responsive to the user interacting with the first input device in a manner corresponding to the first input mode to provide the first user input;
  
  receive a second user input of the user from a second input device via a second input mode, wherein the second user input is generated responsive to the user interacting with the second input device in a manner corresponding to the second input mode to provide the second user input, wherein the first user input and the second user input are related to one another, and wherein one of the first user input or the second user input comprises a voice input received from at least one of the first input device or the second input device via a voice input mode, and the other one of the first user input or the second user input comprises a non-voice input received from at least one of the first input device or the second input device via a non-voice input mode;
  
  determine, based on the second user input, context information for interpreting the first user input, wherein the context information identifies a first item of a first item type;
  
  determine further context information based on the first user input, wherein the further context information identifies a second item of a second item type that is related to the first item of the first item type;
  
  generate a query based on the context information and the further context information to obtain one or more intermediary results, wherein the generated query comprises a query related to the second item of the second type;
  
  determine a user request based on the one or more intermediary results;
  
  provide a response to the user request; and
  
  provide, based on at least one of the context information for interpreting the first user input or the further context information, an advertisement for presentation to the user.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The system of claim 15, wherein the first item of the first item type comprises one of a command or a music-related product, and the second item of the second item type comprises the other one of the command or the music-related product.
  - 17. The system of claim 15, further comprising:
    - provide the context information for interpreting the first user input to an advertiser system; and
      
      obtain the advertisement from the advertiser system responsive to providing the context information to the advertiser system,wherein providing the advertisement comprises providing the advertisement obtained from the advertiser system.
  - 18. The system of claim 15, wherein the context information for interpreting the first user input is used as input for selecting the advertisement, and wherein providing the advertisement comprises providing the selected advertisement.
  - 19. The system of claim 15, wherein information about the user request is used as input for selecting the advertisement, and wherein providing the advertisement comprises providing the selected advertisement.
  - 20. The system of claim 15, wherein the first user input comprises the voice input received via the voice input mode, and the second user input comprises the non-voice input received via the non-voice input mode,wherein determining the context information comprises determining, based on the non-voice input, the context information for interpreting the voice input,wherein determining the user request comprises determining the user request based on the voice input and the context information for interpreting the voice input, andwherein providing the advertisement comprises providing the advertisement based on the context information for interpreting the voice input.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle International Corporation (Oracle Corporation)
Original Assignee
VoiceBox Technologies Corporation (Microsoft Corporation)
Inventors
Weider, Chris, Baldwin, Larry
Primary Examiner(s)
VO, HUYEN X

Application Number

US14/278,645
Publication Number

US 20140249822A1
Time in Patent Office

453 Days
Field of Search

704 1- 10, 704/231, 704/251, 704/255, 704/257, 704/270, 704/277, 704/270.1, 704/235, 705/14.54, 707/5
US Class Current

1/1
CPC Class Codes

G06Q 30/02   Marketing; Price estimation...

G06Q 30/0241   Advertisements

G06Q 30/0261   based on user location

G06Q 30/0273   Determination of fees for a...

G10L 15/18   using natural language mode...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/22   Procedures used during a sp...

G10L 15/24   Speech recognition using no...

G10L 17/22   Interactive procedures; Man...

G10L 2015/223   Execution procedure of a sp...

G10L 2015/227   of the speaker; Human-fact...

System and method for processing multi-modal device interactions in a natural language voice services environment

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

709 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for processing multi-modal device interactions in a natural language voice services environment

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

709 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links