System and method for a cooperative conversational voice user interface

US 10,297,249 B2
Filed: 04/20/2015
Issued: 05/21/2019
Est. Priority Date: 10/16/2006
Status: Active Grant

- Alert
- Pin

First Claim

Patent Images

1. A computer-implemented method of facilitating natural language system responses using short-term knowledge generated based on one or more prior multi-modal device interactions, the method being implemented by a computer system that includes one or more physical processors executing one or more computer program instructions which, when executed, perform the method, the method comprising:

receiving, by the computer system during a first conversation, a first voice input via a first input device, the first voice input comprising a first natural language utterance;

receiving, by the computer system, a second voice input comprising the first natural language utterance via a second input device;

comparing, by the computer system, the first voice input with the second voice input;

filtering, by the computer system, sound from the first voice input and the second voice input based on the comparison;

obtaining, by the computer system during the first conversation, a user interface state related to one or more non-voice inputs associated with the first voice input, the one or more non-voice inputs comprising at least a first non-voice input;

generating, by the computer system, the short-term knowledge based on at least the first voice input and the first non-voice input;

determining, by the computer system, based on the short-term knowledge, a first context for the first natural language utterance;

determining, by the computer system, based on the first context, an interpretation of the first natural language utterance; and

generating, by the computer system, based on the interpretation of the first natural language utterance, a first response to the first natural language utterance.

View all claims

9 Assignments

Timeline View

Assignment View

Litigations

0 Petitions

Accused Products

Abstract

A cooperative conversational voice user interface is provided. The cooperative conversational voice user interface may build upon short-term and long-term shared knowledge to generate one or more explicit and/or implicit hypotheses about an intent of a user utterance. The hypotheses may be ranked based on varying degrees of certainty, and an adaptive response may be generated for the user. Responses may be worded based on the degrees of certainty and to frame an appropriate domain for a subsequent utterance. In one implementation, misrecognitions may be tolerated, and conversational course may be corrected based on subsequent utterances and/or responses.

Citations

28 Claims

1. A computer-implemented method of facilitating natural language system responses using short-term knowledge generated based on one or more prior multi-modal device interactions, the method being implemented by a computer system that includes one or more physical processors executing one or more computer program instructions which, when executed, perform the method, the method comprising:
- receiving, by the computer system during a first conversation, a first voice input via a first input device, the first voice input comprising a first natural language utterance;
  
  receiving, by the computer system, a second voice input comprising the first natural language utterance via a second input device;
  
  comparing, by the computer system, the first voice input with the second voice input;
  
  filtering, by the computer system, sound from the first voice input and the second voice input based on the comparison;
  
  obtaining, by the computer system during the first conversation, a user interface state related to one or more non-voice inputs associated with the first voice input, the one or more non-voice inputs comprising at least a first non-voice input;
  
  generating, by the computer system, the short-term knowledge based on at least the first voice input and the first non-voice input;
  
  determining, by the computer system, based on the short-term knowledge, a first context for the first natural language utterance;
  
  determining, by the computer system, based on the first context, an interpretation of the first natural language utterance; and
  
  generating, by the computer system, based on the interpretation of the first natural language utterance, a first response to the first natural language utterance.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method of claim 1, the method further comprising:
    - receiving, at the computer system, a second natural language utterance via the first input device;
      
      determining, by the computer system, based on the short-term knowledge, whether the second natural language utterance corresponds to the first context;
      
      determining, by the computer system, responsive to the second natural language utterance not corresponding to the first context, a second context for the second natural language utterance based on the short-term knowledge;
      
      determining, by the computer system, based on the second context, an interpretation of the second natural language utterance; and
      
      generating, by the computer system, based on the interpretation of the second natural language utterance, a response to the second natural language utterance.
  - 3. The method of claim 1, the method further comprising:
    - tracking, by the computer system, contexts identified for multiple consecutive natural language user utterances to generate a context stack, the context stack including the contexts identified for the multiple consecutive natural language user utterances;
      
      receiving, at the computer system, a second natural language utterance via the first input device; and
      
      determining, by the computer system, based on the short-term knowledge, whether the second natural language utterance corresponds to one or more individual ones of the contexts included in the context stack.
  - 4. The method of claim 3, wherein determining whether the second natural language utterance corresponds to one or more individual ones of the contexts included in the context stack includes:
    - determining whether the second natural language utterance corresponds to a most recent context included in the context stack; and
      
      responsive to the second natural language utterance not corresponding to the most recent context, determining whether the second natural language utterance corresponds to a second most recent context included in the context stack.
  - 5. The method of claim 3, further comprising:
    - determining, by the computer system, responsive to the second natural language utterance not corresponding to one or more individual ones of the contexts included in the context stack, a second context for the second natural language utterance based on the short-term knowledge; and
      
      determining, by the computer system, based on the second context, an interpretation of the second natural language utterance.
  - 6. The method of claim 1, wherein determining the interpretation of the first natural language utterance comprises determining, based on the first context, an interpretation of one or more recognized words of the first natural language utterance.
  - 7. The method of claim 1, the method further comprising:
    - expiring, by the computer system, one or more items of short-term knowledge that are based on one or more natural language utterances received during conversations occurring prior to the first conversation.
  - 8. The method of claim 7, further comprising:
    - accumulating, by the computer system, long-term knowledge based on the one or more items of short-term knowledge that are expired.
  - 9. The method of claim 1, the method further comprising:
    - accumulating, by the computer system, long-term knowledge, wherein the long-term knowledge is accumulated based on one or more natural language utterances received during conversations occurring prior to the first conversation, wherein the first context is determined based on the short-term knowledge and the long-term knowledge.
  - 10. The method of claim 1, further comprising:
    - determining, by the computer system, based on the short-term knowledge, a manner in which the first natural language utterance is spoken, wherein the first response is generated based on the determined manner and the interpretation of the first natural language utterance.
  - 11. The method of claim 10, wherein the manner in which the first natural language utterance is spoken includes at least one of a tone, a pace, an inflection, or a timing.
  - 12. The method of claim 1, wherein the second input device is included with the first input device in a single device.
  - 13. The method of claim 1, wherein generating the first response comprises generating, by the computer system, a non-voice response in a graphical user interface and/or a voice response via a speaker.
  - 14. The method of claim 1, wherein the first natural language utterance relates to a request and the first response is intended to be responsive to the request, the method further comprising:
    - receiving, by the computer system, a second natural language utterance in response to the first response, wherein the second natural language utterance comprises a clarification related to the interpretation of the first natural language utterance; and
      
      responsive to receipt of the second natural language utterance, generating, by the computer system, a response to the second natural language utterance based on the interpretation of the first natural language utterance and the clarification related to the interpretation of the first natural language utterance.
  - 15. The method of claim 1, wherein the first non-voice input is received via a non-voice input device comprising a touch screen device, the method further comprising:
    - receiving, by the computer system, information indicating a location on the touch screen device that received the first non-voice input.

16. A system for facilitating natural language system responses via short-term knowledge generated based on one or more prior multi-modal device interactions, the system comprising:
- one or more physical processors programmed with one or more computer program instructions which, when executed, cause the one or more physical processors to;
  
  receive, during a first conversation, a first voice input via a first input device, the first voice input comprising a first natural language utterance;
  
  receive a second voice input comprising the first natural language utterance via a second input device;
  
  compare the first voice input with the second voice input;
  
  filter sound from the first voice input and the second voice input based on the comparison;
  
  obtain, during the first conversation, a user interface state related to one or more non-voice inputs associated with the first voice input, the one or more non-voice inputs comprising at least a first non-voice input;
  
  generate the short-term knowledge based on at least the first voice input and the first non-voice input;
  
  determine, based on the short-term knowledge, a first context for the first natural language utterance;
  
  determine, based on the first context, an interpretation of the first natural language utterance; and
  
  generate, based on the interpretation of the first natural language utterance, a first response to the first natural language utterance.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
- - 17. The system of claim 16, wherein the one or more physical processors are further caused to:
    - receive a second natural language utterance via the first input device;
      
      determine, based on the short-term knowledge, whether the second natural language utterance corresponds to the first context;
      
      determine, responsive to the second natural language utterance not corresponding to the first context, a second context for the second natural language utterance based on the short-term knowledge;
      
      determine, based on the second context, an interpretation of the second natural language utterance; and
      
      generate, based on the interpretation of the second natural language utterance, a response to the second natural language utterance.
  - 18. The system of claim 16, wherein the one or more physical processors are further caused to:
    - track contexts identified for multiple consecutive natural language user utterances to generate a context stack, the context stack including the contexts identified for the multiple consecutive natural language user utterances;
      
      receive a second natural language utterance via the first input device; and
      
      determine, based on the short-term knowledge, whether the second natural language utterance corresponds to one or more individual ones of the contexts included in the context stack.
  - 19. The system of claim 18, wherein determining whether the second natural language utterance corresponds to one or more individual ones of the contexts included in the context stack includes:
    - determining whether the second natural language utterance corresponds to a most recent context included in the context stack; and
      
      responsive to the second natural language utterance not corresponding to the most recent context, determining whether the second natural language utterance corresponds to a second most recent context included in the context stack.
  - 20. The system of claim 18, wherein the one or more physical processors are caused to:
    - determine, responsive to the second natural language utterance not corresponding to one or more individual ones of the contexts included in the context stack, a second context for the second natural language utterance based on the short-term knowledge; and
      
      determine, based on the second context, an interpretation of the second natural language utterance.
  - 21. The system of claim 16, wherein determining the interpretation of the first natural language utterance comprises determining, based on the first context, an interpretation of one or more recognized words of the first natural language utterance.
  - 22. The system of claim 16, wherein the one or more physical processors are caused to:
    - expire one or more items of accumulated short-term knowledge that are based on one or more natural language utterances received during conversations occurring prior to the first conversation; and
      
      accumulate long-term knowledge based on the one or more items of accumulated short-term knowledge that are expired, wherein the first context is determined based on short-term knowledge and the long-term knowledge.
  - 23. The system of claim 16, wherein the one or more physical processors are caused to:
    - determine, based on the short-term knowledge, a manner in which the first natural language utterance is spoken, wherein the first response is generated based on the determined manner and the interpretation of the first natural language utterance.
  - 24. The system of claim 23, wherein the manner in which the first natural language utterance is spoken includes at least one of a tone, a pace, an inflection, or a timing.
  - 25. The system of claim 16, wherein the second input device is included with the first input device in a single device.
  - 26. The system of claim 16, wherein to generate the first response, the one or more physical processors are caused to:
    - generate a non-voice response in a graphical user interface and/or a voice response via a speaker.
  - 27. The system of claim 16, wherein the first natural language utterance relates to a request and the first response is intended to be responsive to the request, wherein the one or more physical processors are further caused to:
    - receive a second natural language utterance in response to the first response, wherein the second natural language utterance comprises a clarification related to the interpretation of the first natural language utterance; and
      
      responsive to receipt of the second natural language utterance, generate a response to the second natural language utterance based on the interpretation of the first natural language utterance and the clarification related to the interpretation of the first natural language utterance.
  - 28. The system of claim 16, wherein the first non-voice input is received via a non-voice input device comprising a touch screen device, and wherein the one or more physical processors are further caused to:
    - receive information indicating a location on the touch screen device that received the first non-voice input.

Specification

Resources

Litigation Campaign Assessment

Litigation Data

Current Assignee
VB Assets, LLC
Original Assignee
VB Assets, LLC
Inventors
Baldwin, Larry, Freeman, Tom, Tjalve, Michael, Ebersold, Blane, Weider, Chris
Primary Examiner(s)
Yen, Eric

Application Number

US14/691,445
Publication Number

US 20150228276A1
Time in Patent Office

1,492 Days
Field of Search
US Class Current
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G06F 40/30   Semantic analysis

G10L 15/18   using natural language mode...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/1822   Parsing for meaning underst...

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

G10L 17/22   Interactive procedures; Man...

G10L 2015/0631   Creating reference template...

G10L 2015/225   Feedback of the input speech

G10L 2015/228   of application context

G10L 2021/02166   Microphone arrays; Beamforming

G10L 25/51   for comparison or discrimin...

G10L 25/63   for estimating an emotional...

System and method for a cooperative conversational voice user interface

First Claim

9 Assignments

Litigations

0 Petitions

Accused Products

Abstract

Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for a cooperative conversational voice user interface

First Claim

9 Assignments

Subscription Required

Subscription Required

Litigations

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links