Using context information to facilitate processing of commands in a virtual assistant

US 9,858,925 B2
Filed: 09/30/2011
Issued: 01/02/2018
Est. Priority Date: 06/05/2009
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for disambiguating user input to perform a task on a computing device having at least one processor, comprising:

at an output device, prompting a user for input;

at an input device, receiving spoken user input;

at a processor communicatively coupled to the output device and to the input device, receiving context information from a context source;

at the processor, generating a first plurality of candidate interpretations of the received spoken user input;

at the processor, disambiguating the intent of a word in the first plurality of candidate interpretations based on the context information to generate a second plurality of candidate interpretations, wherein the second plurality of candidate interpretations is a subset of the first plurality of candidate interpretations;

at the processor, sorting the second plurality of candidate interpretations by relevance based on the context information;

at the processor, deriving a representation of user intent based on the sorted second plurality of candidate interpretations;

at the processor, identifying at least one task and at least one parameter for the task, based at least in part on the derived representation of user intent;

at the processor, executing the at least one task using the at least one parameter, to derive a result;

at the processor, generating a dialog response based on the derived result; and

at the output device, outputting the generated dialog response.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A virtual assistant uses context information to supplement natural language or gestural input from a user. Context helps to clarify the user'"'"'s intent and to reduce the number of candidate interpretations of the user'"'"'s input, and reduces the need for the user to provide excessive clarification input. Context can include any available information that is usable by the assistant to supplement explicit user input to constrain an information-processing problem and/or to personalize results. Context can be used to constrain solutions during various phases of processing, including, for example, speech recognition, natural language processing, task flow processing, and dialog generation.

3560 Citations

95 Claims

1. A computer-implemented method for disambiguating user input to perform a task on a computing device having at least one processor, comprising:
- at an output device, prompting a user for input;
  
  at an input device, receiving spoken user input;
  
  at a processor communicatively coupled to the output device and to the input device, receiving context information from a context source;
  
  at the processor, generating a first plurality of candidate interpretations of the received spoken user input;
  
  at the processor, disambiguating the intent of a word in the first plurality of candidate interpretations based on the context information to generate a second plurality of candidate interpretations, wherein the second plurality of candidate interpretations is a subset of the first plurality of candidate interpretations;
  
  at the processor, sorting the second plurality of candidate interpretations by relevance based on the context information;
  
  at the processor, deriving a representation of user intent based on the sorted second plurality of candidate interpretations;
  
  at the processor, identifying at least one task and at least one parameter for the task, based at least in part on the derived representation of user intent;
  
  at the processor, executing the at least one task using the at least one parameter, to derive a result;
  
  at the processor, generating a dialog response based on the derived result; and
  
  at the output device, outputting the generated dialog response.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
- - 2. The method of claim 1, wherein:
    - prompting the user comprises prompting the user via a conversational interface; and
      
      receiving the spoken user input comprises;
      
      receiving the spoken user input via the conversational interface; and
      
      converting the spoken user input to a text representation.
  - 3. The method of claim 2, wherein converting the spoken user input to a text representation comprises:
    - generating a plurality of candidate text interpretations of the spoken user input; and
      
      ranking at least a subset of the generated candidate text interpretations;
      
      wherein at least one of the generating and ranking steps is performed using the received context information.
  - 4. The method of claim 3, wherein the received context information used in at least one of the generating and ranking steps comprises at least one selected from the group consisting of:
    - data describing an acoustic environment in which the spoken user input is received;
      
      data received from at least one sensor;
      
      vocabulary obtained from a database associated with the user;
      
      vocabulary associated with application preferences;
      
      vocabulary obtained from usage history; and
      
      current dialog state.
  - 5. The method of claim 1, wherein prompting the user comprises generating at least one prompt based at least in part on the received context information.
  - 6. The method of claim 1, wherein disambiguating the received spoken user input based on the context information to derive a representation of user intent comprises performing natural language processing on the received spoken user input based at least in part on the received context information.
  - 7. The method of claim 6, wherein the received context information used in disambiguating the received spoken user input comprises at least one selected from the group consisting of:
    - data describing an event;
      
      application context;
      
      input previously provided by the user;
      
      known information about the user;
      
      location;
      
      date;
      
      environmental conditions; and
      
      history.
  - 8. The method of claim 1, wherein performing natural language processing comprises selecting among a plurality of candidate interpretations of the received spoken user input using the received context information.
  - 9. The method of claim 1, wherein performing natural language processing comprises determining a referent for at least one pronoun in the received spoken user input.
  - 10. The method of claim 1, wherein identifying at least one task and at least one parameter for the task comprises identifying at least one task and at least one parameter for the task based at least in part on the received context information.
  - 11. The method of claim 10, wherein identifying at least one task and at least one parameter for the task based at least in part on the received context information comprises:
    - receiving a plurality of candidate representations of user intent;
      
      determining a preferred interpretation of user intent based on at least one selected from the group consisting of;
      
      at least one domain model;
      
      at least one task flow model; and
      
      at least one dialog flow model.
  - 12. The method of claim 10, wherein the received context information used in identifying at least one task and at least one parameter for the task comprises at least one selected from the group consisting of:
    - data describing an event;
      
      data from a database associated with the user;
      
      data received from at least one sensor;
      
      application context;
      
      input previously provided by the user;
      
      known information about the user;
      
      location;
      
      date;
      
      environmental conditions; and
      
      history.
  - 13. The method of claim 1, wherein generating a dialog response comprises generating a dialog response based at least in part on the received context information.
  - 14. The method of claim 13, wherein generating a dialog response based at least in part on the received context information comprises at least one selected from the group consisting of:
    - generating a dialog response including a named referent;
      
      generating a dialog response including a symbolic name associated with a telephone number;
      
      determining which of a plurality of names to use for a referent;
      
      determining a level of detail for the generated response; and
      
      filtering a response based on previous output.
  - 15. The method of claim 13, wherein the received context information used in generating a dialog response comprises at least one selected from the group consisting of:
    - data from a database associated with the user;
      
      application context;
      
      input previously provided by the user;
      
      known information about the user;
      
      location;
      
      date;
      
      environmental conditions; and
      
      history.
  - 16. The method of claim 1, wherein the received context information comprises at least one selected from the group consisting of:
    - context information stored at a server; and
      
      context information stored at a client.
  - 17. The method of claim 1, wherein receiving context information from a context source comprises:
    - requesting the context information from a context source; and
      
      receiving the context information in response to the request.
  - 18. The method of claim 1, wherein receiving context information from a context source comprises:
    - receiving at least a portion of the context information prior to receiving the spoken user input.
  - 19. The method of claim 1, wherein receiving context information from a context source comprises:
    - receiving at least a portion of the context information after receiving the spoken user input.
  - 20. The method of claim 1, wherein receiving context information from a context source comprises:
    - receiving static context information as part of an initialization step; and
      
      receiving additional context information after receiving the spoken user input.
  - 21. The method of claim 1, wherein receiving context information from a context source comprises:
    - receiving push notification of a change in context information; and
      
      responsive to the push notification, updating locally stored context information.
  - 22. The method of claim 1, wherein the computing device comprises at least one selected from the group consisting of:
    - a telephone;
      
      a smartphone;
      
      a tablet computer;
      
      a laptop computer;
      
      a personal digital assistant;
      
      a desktop computer;
      
      a kiosk;
      
      a consumer electronic device;
      
      a consumer entertainment device;
      
      a music player;
      
      a camera;
      
      a television;
      
      an electronic gaming unit; and
      
      a set-top box.
  - 23. The method of claim 1, wherein the received context information further comprises application context.
  - 24. The method of claim 1, wherein the received context information further comprises personal data associated with the user.
  - 25. The method of claim 1, wherein the received context information further comprises data from a database associated with the user.
  - 26. The method of claim 1, wherein the received context information further comprises data obtained from dialog history.
  - 27. The method of claim 1, wherein the received context information further comprises data received from at least one sensor.
  - 28. The method of claim 1, wherein the received context information further comprises application preferences.
  - 29. The method of claim 1, wherein the received context information further comprises application usage history.
  - 30. The method of claim 1, wherein the received context information further comprises data describing an event.
  - 31. The method of claim 1, wherein the received context information further comprises current dialog state.
  - 32. The method of claim 1, wherein the received context information further comprises input previously provided by the user.
  - 33. The method of claim 1, wherein the received context information further comprises location.
  - 34. The method of claim 1, wherein the received context information further comprises local time.
  - 35. The method of claim 1, wherein the received context information further comprises environmental conditions.

36. A computer program product for disambiguating user input to perform a task on a computing device having at least one processor, comprising:
- a non-transitory computer-readable storage medium; and
  
  computer program code, encoded on the medium, configured to cause at least one processor communicatively coupled to an output device and to an input device to perform the steps of;
  
  causing the output device to prompt a user for input;
  
  receiving spoken user input via the input device;
  
  receiving context information from a context source;
  
  generating a first plurality of candidate interpretations of the received spoken user input;
  
  disambiguating the intent of a word in the first plurality of candidate interpretations based on the context information to generate a second plurality of candidate interpretations, wherein the second plurality of candidate interpretations is a subset of the first plurality of candidate interpretations;
  
  at the processor, sorting the second plurality of candidate interpretations by relevance based on the context information;
  
  at the processor, deriving a representation of user intent based on the sorted second plurality of candidate interpretations;
  
  identifying at least one task and at least one parameter for the task, based at least in part on the derived representation of user intent;
  
  executing the at least one task using the at least one parameter, to derive a result;
  
  generating a dialog response based on the derived result; and
  
  causing the output device to output the generated dialog response.
- View Dependent Claims (37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64)
- - 37. The computer program product of claim 36, wherein:
    - the computer program code configured to cause an output device to prompt the user comprises computer program code configured to cause an output device to prompt the user via a conversational interface; and
      
      the computer program code configured to cause at least one processor to receive the spoken user input comprises computer program code configured to cause at least one processor to receive the spoken user input via the conversational interface.
  - 38. The computer program product of claim 36, wherein the computer program code configured to cause at least one processor to receive the spoken user input further comprises:
    - computer program code configured to cause at least one processor to convert the spoken user input to a text representation by;
      
      generating a plurality of candidate text interpretations of the spoken user input; and
      
      ranking at least a subset of the generated candidate text interpretations;
      
      wherein at least one of the generating and ranking steps is performed using the received context information.
  - 39. The computer program product of claim 38, wherein the received context information used in at least one of the generating and ranking steps comprises at least one selected from the group consisting of:
    - data describing an acoustic environment in which the spoken user input is received;
      
      data received from at least one sensor;
      
      vocabulary obtained from a database associated with the user;
      
      vocabulary associated with application preferences;
      
      vocabulary obtained from usage history; and
      
      current dialog state.
  - 40. The computer program product of claim 36, wherein the computer program code configured to cause at least one processor to prompt the user comprises computer program code configured to cause at least one processor to generate at least one prompt based at least in part on the received context information.
  - 41. The computer program product of claim 36, wherein the computer program code configured to cause at least one processor to disambiguate the received spoken user input based on the context information to derive a representation of user intent comprises computer program code configured to cause at least one processor to perform natural language processing on the received spoken user input based at least in part on the received context information.
  - 42. The computer program product of claim 41, wherein the received context information used in disambiguating the received spoken user input comprises at least one selected from the group consisting of:
    - data describing an event;
      
      application context;
      
      input previously provided by the user;
      
      known information about the user;
      
      location;
      
      date;
      
      environmental conditions; and
      
      history.
  - 43. The computer program product of claim 36, wherein the computer program code configured to cause at least one processor to identify at least one task and at least one parameter for the task comprises computer program code configured to cause at least one processor to identify at least one task and at least one parameter for the task based at least in part on the received context information.
  - 44. The computer program product of claim 43, wherein the received context information used in identifying at least one task and at least one parameter for the task comprises at least one selected from the group consisting of:
    - data describing an event;
      
      data from a database associated with the user;
      
      data received from at least one sensor;
      
      application context;
      
      input previously provided by the user;
      
      known information about the user;
      
      location;
      
      date;
      
      environmental conditions; and
      
      history.
  - 45. The computer program product of claim 36, wherein the computer program code configured to cause at least one processor to generate a dialog response comprises computer program code configured to cause at least one processor to generating a dialog response based at least in part on the received context information.
  - 46. The computer program product of claim 45, wherein the received context information used in generating a dialog response comprises at least one selected from the group consisting of:
    - data from a database associated with the user;
      
      application context;
      
      input previously provided by the user;
      
      known information about the user;
      
      location;
      
      date;
      
      environmental conditions; and
      
      history.
  - 47. The computer program product of claim 36, wherein the computer program code configured to cause at least one processor to receive context information from a context source comprises:
    - computer program code configured to cause at least one processor to request the context information from a context source; and
      
      computer program code configured to cause at least one processor to receive the context information in response to the request.
  - 48. The computer program product of claim 36, wherein the computer program code configured to cause at least one processor to receive context information from a context source comprises:
    - computer program code configured to cause at least one processor to receive at least a portion of the context information prior to receiving the spoken user input.
  - 49. The computer program product of claim 36, wherein the computer program code configured to cause at least one processor to receive context information from a context source comprises:
    - computer program code configured to cause at least one processor to receive at least a portion of the context information after receiving the spoken user input.
  - 50. The computer program product of claim 36, wherein the computer program code configured to cause at least one processor to receive context information from a context source comprises:
    - computer program code configured to cause at least one processor to receive static context information as part of an initialization step; and
      
      the computer program code configured to cause at least one processor to receive additional context information after receiving the spoken user input.
  - 51. The computer program product of claim 36, wherein the computer program code configured to cause at least one processor to receive context information from a context source comprises:
    - computer program code configured to cause at least one processor to receive push notification of a change in context information; and
      
      computer program code configured to cause at least one processor to, responsive to the push notification, update locally stored context information.
  - 52. The computer program product of claim 36, wherein the received context information further comprises application context.
  - 53. The computer program product of claim 36, wherein the received context information further comprises personal data associated with the user.
  - 54. The computer program product of claim 36, wherein the received context information further comprises data from a database associated with the user.
  - 55. The computer program product of claim 36, wherein the received context information further comprises data obtained from dialog history.
  - 56. The computer program product of claim 36, wherein the received context information further comprises data received from at least one sensor.
  - 57. The computer program product of claim 36, wherein the received context information further comprises application preferences.
  - 58. The computer program product of claim 36, wherein the received context information further comprises application usage history.
  - 59. The computer program product of claim 36, wherein the received context information further comprises data describing an event.
  - 60. The computer program product of claim 36, wherein the received context information further comprises current dialog state.
  - 61. The computer program product of claim 36, wherein the received context information further comprises input previously provided by the user.
  - 62. The computer program product of claim 36, wherein the received context information further comprises location.
  - 63. The computer program product of claim 36, wherein the received context information further comprises local time.
  - 64. The computer program product of claim 36, wherein the received context information further comprises environmental conditions.

65. A system for disambiguating user input to perform a task, comprising:
- an output device, configured to prompt a user for input;
  
  an input device, configured to receive spoken user input;
  
  at least one processor, communicatively coupled to the output device and to the input device, configured to perform the steps of;
  
  receiving context information from a context source;
  
  generating a first plurality of candidate interpretations of the received spoken user input;
  
  disambiguating the intent of a word in the first plurality of candidate interpretations based on the context information to generate a second plurality of candidate interpretations, wherein the second plurality of candidate interpretations is a subset of the first plurality of candidate interpretations;
  
  sorting the second plurality of candidate interpretations by relevance based on the context information;
  
  deriving a representation of user intent based on the sorted second plurality of candidate interpretations;
  
  identifying at least one task and at least one parameter for the task, based at least in part on the derived representation of user intent;
  
  executing the at least one task using the at least one parameter, to derive a result; and
  
  generating a dialog response based on the derived result.
- View Dependent Claims (66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95)
- - 66. The system of claim 65, wherein:
    - the output device is configured to prompt the user via a conversational interface; and
      
      the input device is configured to receive the spoken user input via the conversational interface;
      
      and wherein the at least one processor is configured to convert the spoken user input to a text representation.
  - 67. The system of claim 66, wherein the at least one processor is configured to convert the spoken user input to a text representation by:
    - generating a plurality of candidate text interpretations of the spoken user input; and
      
      ranking at least a subset of the generated candidate text interpretations;
      
      wherein at least one of the generating and ranking steps is performed using the received context information.
  - 68. The system of claim 67, wherein the received context information used in at least one of the generating and ranking comprises at least one selected from the group consisting of:
    - data describing an acoustic environment in which the spoken user input is received;
      
      data received from at least one sensor;
      
      vocabulary obtained from a database associated with the user;
      
      vocabulary associated with application preferences;
      
      vocabulary obtained from usage history; and
      
      current dialog state.
  - 69. The system of claim 65, wherein the output device is configured to prompt the user by generating at least one prompt based at least in part on the received context information.
  - 70. The system of claim 65, wherein the at least one processor is configured to disambiguate the received spoken user input based on the context information to derive a representation of user intent by performing natural language processing on the received spoken user input based at least in part on the received context information.
  - 71. The system of claim 70, wherein the received context information used in disambiguating the received spoken user input comprises at least one selected from the group consisting of:
    - data describing an event;
      
      application context;
      
      input previously provided by the user;
      
      known information about the user;
      
      location;
      
      date;
      
      environmental conditions; and
      
      history.
  - 72. The system of claim 65, wherein the at least one processor is configured to identify at least one task and at least one parameter for the task by identifying at least one task and at least one parameter for the task based at least in part on the received context information.
  - 73. The system of claim 72, wherein the received context information used in identifying at least one task and at least one parameter for the task comprises at least one selected from the group consisting of:
    - data describing an event;
      
      data from a database associated with the user;
      
      data received from at least one sensor;
      
      application context;
      
      input previously provided by the user;
      
      known information about the user;
      
      location;
      
      date;
      
      environmental conditions; and
      
      history.
  - 74. The system of claim 65, wherein the at least one processor is configured to generate a dialog response by generating a dialog response based at least in part on the received context information.
  - 75. The system of claim 74, wherein the received context information used in generating a dialog response comprises at least one selected from the group consisting of:
    - data from a database associated with the user;
      
      application context;
      
      input previously provided by the user;
      
      known information about the user;
      
      location;
      
      date;
      
      environmental conditions; and
      
      history.
  - 76. The system of claim 65, wherein the received context information comprises at least one selected from the group consisting of:
    - context information stored at a server; and
      
      context information stored at a client.
  - 77. The system of claim 65, wherein the at least one processor is configured to receive context information from a context source by:
    - requesting the context information from a context source; and
      
      receiving the context information in response to the request.
  - 78. The system of claim 65, wherein the at least one processor is configured to receive context information from a context source by:
    - receiving at least a portion of the context information prior to receiving the spoken user input.
  - 79. The system of claim 65, wherein the at least one processor is configured to receive context information from a context source by:
    - receiving at least a portion of the context information after receiving the spoken user input.
  - 80. The system of claim 65, wherein the at least one processor is configured to receive context information from a context source by:
    - receiving static context information as part of an initialization step; and
      
      receiving additional context information after receiving the spoken user input.
  - 81. The system of claim 65, wherein the at least one processor is configured to receive context information from a context source by:
    - receiving push notification of a change in context information; and
      
      responsive to the push notification, updating locally stored context information.
  - 82. The system of claim 65, wherein the output device, input device, and at least one processor are implemented as components of at least one selected from the group consisting of:
    - a telephone;
      
      a smartphone;
      
      a tablet computer;
      
      a laptop computer;
      
      a personal digital assistant;
      
      a desktop computer;
      
      a kiosk;
      
      a consumer electronic device;
      
      a consumer entertainment device;
      
      a music player;
      
      a camera;
      
      a television;
      
      an electronic gaming unit; and
      
      a set-top box.
  - 83. The system of claim 65, wherein the received context information further comprises application context.
  - 84. The system of claim 65, wherein the received context information further comprises personal data associated with the user.
  - 85. The system of claim 65, wherein the received context information further comprises data from a database associated with the user.
  - 86. The system of claim 65, wherein the received context information further comprises data obtained from dialog history.
  - 87. The system of claim 65, wherein the received context information further comprises data received from at least one sensor.
  - 88. The system of claim 65, wherein the received context information further comprises application preferences.
  - 89. The system of claim 65, wherein the received context information further comprises application usage history.
  - 90. The system of claim 65, wherein the received context information further comprises data describing an event.
  - 91. The system of claim 65, wherein the received context information further comprises current dialog state.
  - 92. The system of claim 65, wherein the received context information further comprises input previously provided by the user.
  - 93. The system of claim 65, wherein the received context information further comprises location.
  - 94. The system of claim 65, wherein the received context information further comprises local time.
  - 95. The system of claim 65, wherein the received context information further comprises environmental conditions.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
Gruber, Thomas Robert, Brigham, Christopher Dean, Keen, Daniel S., Novick, Gregory, Phipps, Benjamin S.
Primary Examiner(s)
Shah, Paras D
Assistant Examiner(s)
Le, Thuykhanh

Application Number

US13/250,854
Publication Number

US 20120265528A1
Time in Patent Office

2,286 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 2203/0381   Multimodal input, i.e. inte...

G10L 15/18   using natural language mode...

G10L 15/183   using context dependencies,...

Using context information to facilitate processing of commands in a virtual assistant

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

3560 Citations

95 Claims

Specification

Solutions

Use Cases

Quick Links

Using context information to facilitate processing of commands in a virtual assistant

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

3560 Citations

95 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links