System and method for processing multi-modal device interactions in a natural language voice services environment

US 8,719,009 B2
Filed: 09/14/2012
Issued: 05/06/2014
Est. Priority Date: 02/20/2009
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for processing a natural language utterance via multiple input modes, the method being implemented by a computer system that includes one or more processors executing one or more computer program instructions which, when executed, perform the method, the method comprising:

receiving, by the one or more physical processors, a first user input via a voice input mode and a second user input via a non-voice input mode, wherein the first user input includes a natural language utterance, and the second user input includes a non-voice input relating to the natural language utterance;

processing, by the one or more physical processors, the natural language utterance to recognize one or more words of the natural language utterance, wherein the one or more recognized words include a reference word;

identifying, by the one or more physical processors based on the one or more recognized words, a query;

identifying, by the one or more physical processors based on the non-voice input, context information for the one or more recognized words, wherein the context information indicates context for the reference word;

determining, by the one or more physical processors based on the context information, one or more interpretations of the one or more recognized words, wherein determining the one or more interpretations comprises;

identifying the reference word;

identifying, based on the reference word context, a product or service to which the reference word refers; and

determining an meaning of the reference word based on the identification of the product or service; and

generating, by the one or more physical processors based on the one or more interpretations, a response to the query.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for processing multi-modal device interactions in a natural language voice services environment may be provided. In particular, one or more multi-modal device interactions may be received in a natural language voice services environment that includes one or more electronic devices. The multi-modal device interactions may include a non-voice interaction with at least one of the electronic devices or an application associated therewith, and may further include a natural language utterance relating to the non-voice interaction. Context relating to the non-voice interaction and the natural language utterance may be extracted and combined to determine an intent of the multi-modal device interaction, and a request may then be routed to one or more of the electronic devices based on the determined intent of the multi-modal device interaction.

Citations

16 Claims

1. A computer-implemented method for processing a natural language utterance via multiple input modes, the method being implemented by a computer system that includes one or more processors executing one or more computer program instructions which, when executed, perform the method, the method comprising:
- receiving, by the one or more physical processors, a first user input via a voice input mode and a second user input via a non-voice input mode, wherein the first user input includes a natural language utterance, and the second user input includes a non-voice input relating to the natural language utterance;
  
  processing, by the one or more physical processors, the natural language utterance to recognize one or more words of the natural language utterance, wherein the one or more recognized words include a reference word;
  
  identifying, by the one or more physical processors based on the one or more recognized words, a query;
  
  identifying, by the one or more physical processors based on the non-voice input, context information for the one or more recognized words, wherein the context information indicates context for the reference word;
  
  determining, by the one or more physical processors based on the context information, one or more interpretations of the one or more recognized words, wherein determining the one or more interpretations comprises;
  
  identifying the reference word;
  
  identifying, based on the reference word context, a product or service to which the reference word refers; and
  
  determining an meaning of the reference word based on the identification of the product or service; and
  
  generating, by the one or more physical processors based on the one or more interpretations, a response to the query.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein the context information is identified further based on the natural language utterance.
  - 3. The method of claim 1, further comprising:
    - determining, by the one or more physical processors, prior context information associated with one or more prior natural language utterances, wherein the one or more prior natural language utterances are received by the one or more physical processors before the natural language utterance is received, andwherein generating the response comprises generating the response based on the one or more interpretations and the prior context information.
  - 4. The method of claim 1, wherein the natural language utterance identifies a domain of the query, and wherein the response is generated based on the domain.
  - 5. The method of claim 1, wherein the non-voice input identifies data, a segment, item, or a point of focus on the another input device, and wherein the data, segment, item, or point of focus provides the context information.
  - 6. The method of claim 1, wherein the non-voice input identifies a point of focus on a touch screen, and wherein the identifying the context information comprises identifying a location based on the point of focus on the touch screen.
  - 7. The method of claim 6, wherein the query relates to the product or service, and wherein the generating the response comprises retrieving information on the product or service with respect to the location.
  - 8. The method of claim 1, wherein the generating the response comprises routing the query to a device or application configured to process the query.
  - 9. The method of claim 1, wherein the identifying the query comprises:
    - identifying one or more interrogative pronouns in the natural language utterance; and
      
      identifying the query based on the one or more interrogative pronouns.
  - 10. The method of claim 1, further comprising:
    - receiving, by the one or more physical processors, a third user input;
      
      identifying, by the one or more physical processors, second context information from the third user input, wherein the context information is different than the second context information; and
      
      generating, by the one or more physical processors, a response to the third user input based on the identified query and based on the second context information.
  - 11. The method of claim 10, wherein the first user input identifies a domain of the query, wherein the second user input identifies a location associated with the query, and wherein the third user input identifies a different location associated with the query.

12. A device for processing a natural language utterance via multiple input modes, comprising:
- one or more physical processors programmed to execute one or more computer program instructions which, when executed, cause the one or more physical processors to;
  
  receive a first user input via a voice input mode and a second user input via a non-voice input mode, wherein the first user input includes a natural language utterance, and the second user input includes a non-voice input relating to the natural language utterance;
  
  process the natural language utterance to recognize one or more words of the natural language utterance, wherein the one or more recognized words include a reference word;
  
  identify, based on the one or more recognized words, a query;
  
  identify, based on the non-voice input, context information for the one or more recognized words, wherein the context information indicates context for the reference word;
  
  determine, based on the context information, one or more interpretations of the one or more words, wherein determining the one or more interpretations comprises;
  
  identifying the reference word;
  
  identifying, based on the reference word context, a product or service to which the reference word refers; and
  
  determining an meaning of the reference word based on the identification of the product or service; and
  
  generate, based on the one or more interpretations, a response to the query.

13. A non-transitory computer-readable medium for processing a natural language utterance via multiple input modes, the non-transitory computer-readable medium comprising one or more instructions that, when executed by one or more processors, cause the one or more processors to:
- receive a first user input via a voice input mode and a second user input via a non-voice input mode, wherein the first user input includes a natural language utterance, and the second user input includes a non-voice input relating to the natural language utterance;
  
  process the natural language utterance to recognize one or more words of the natural language utterance, wherein the one or more recognized words include a reference word;
  
  identify, based on the one or more recognized words, a query;
  
  identify, based on the non-voice input, context information for the one or more recognized words, wherein the context information indicates context for the reference word;
  
  determine, based on the context information, one or more interpretations of the one or more recognized words, wherein determining the one or more interpretations comprises;
  
  identifying the reference word;
  
  identifying, based on the reference word context, a product or service to which the reference word refers; and
  
  determining an meaning of the reference word based on the identification of the product or service; and
  
  generate, based on the one or more interpretations, a response to the query.

14. A computer-implemented method of processing a natural language utterance via multiple input modes, the method being implemented by a computer system that includes one or more physical processors executing one or more computer program instructions which, when executed, perform the method, the method comprising:
- receiving, by the one or more physical processors, a first user input via a voice input mode and a second user input via a non-voice input mode, wherein the first user input includes a natural language utterance, and the second user input includes a non-voice input relating to the natural language utterance;
  
  processing, by the one or more physical processors, the natural language utterance to recognize one or more words of the natural language utterance, wherein the one or more recognized words include a reference word;
  
  identifying, by the one or more physical processors based on the one or more recognized words, a command;
  
  identifying, by the one or more physical processors based on the non-voice input, context information for the one or more recognized words, wherein the context information indicates context for the reference word;
  
  determining, by the one or more physical processors based on the context information, one or more interpretations of the one or more recognized words, wherein determining the one or more interpretations comprises;
  
  identifying the reference word;
  
  identifying, based on the reference word context, a product or service to which the reference word refers; and
  
  determining an meaning of the reference word based on the identification of the product or service; and
  
  generating, by the one or more physical processors based on the one or more interpretations, a command signal associated with the command.
- View Dependent Claims (15, 16)
- - 15. The method of claim 14, wherein the command includes a multimedia processing command.
  - 16. The method of claim 15, wherein the context information identifies multimedia associated with the multimedia processing command.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle International Corporation (Oracle Corporation)
Original Assignee
VoiceBox Technologies Corporation (Microsoft Corporation)
Inventors
Baldwin, Larry, Weider, Chris
Primary Examiner(s)
Vo, Huyen X.

Application Number

US13/619,421
Publication Number

US 20130054228A1
Time in Patent Office

599 Days
Field of Search

704 1- 10, 704/251, 704/255, 704/257, 704/270, 704/275, 704/270.1
US Class Current

704/9
CPC Class Codes

G06Q 30/02   Marketing; Price estimation...

G06Q 30/0241   Advertisements

G06Q 30/0261   based on user location

G06Q 30/0273   Determination of fees for a...

G10L 15/18   using natural language mode...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/22   Procedures used during a sp...

G10L 15/24   Speech recognition using no...

G10L 17/22   Interactive procedures; Man...

G10L 2015/223   Execution procedure of a sp...

G10L 2015/227   of the speaker; Human-fact...

System and method for processing multi-modal device interactions in a natural language voice services environment

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for processing multi-modal device interactions in a natural language voice services environment

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links