System and method for an integrated, multi-modal, multi-device natural language voice services environment

US 8,589,161 B2
Filed: 05/27/2008
Issued: 11/19/2013
Est. Priority Date: 05/27/2008
Status: Active Grant

First Claim

Patent Images

1. A method to provide an integrated, multi-modal, natural language voice services environment having an input device, a central device, and one or more secondary devices, wherein the method comprises:

receiving, at the central device, a multi-modal natural language input from the input device, wherein the input device initially received the multi-modal natural language input;

maintaining, on the input device, the central device, and the one or more secondary devices, a constellation model that describes natural language resources, dynamic states, and intent determination capabilities associated with the input device, the central device, and the one or more secondary devices;

aggregating the natural language resources, the dynamic states, and the intent determination capabilities associated with the input device and the one or more secondary devices on the central device to converge the natural language resources, the dynamic states, and the intent determination capabilities held across the natural language voice services environment on the central device;

determining, on the central device, a preliminary intent associated with the multi-modal natural language input using the converged natural language resources, dynamic states, and intent determination capabilities held across the natural language voice services environment;

sending the multi-modal natural language input from the central device to the one or more secondary devices to invoke the intent determination capabilities associated with the one or more secondary devices;

collating, at the central device, intent determination responses received from the one or more secondary devices with the preliminary intent determined on the central device to generate an intent hypothesis associated with the multi-modal natural language input on the central device; and

returning the intent hypothesis associated with the multi-modal natural language input and information relating to one or more requests associated with the multi-modal natural language input to the input device, wherein the input device invokes one or more actions based on the returned intent hypothesis and the information relating to one or more requests associated with the multi-modal natural language input.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for an integrated, multi-modal, multi-device natural language voice services environment may be provided. In particular, the environment may include a plurality of voice-enabled devices each having intent determination capabilities for processing multi-modal natural language inputs in addition to knowledge of the intent determination capabilities of other devices in the environment. Further, the environment may be arranged in a centralized manner, a distributed peer-to-peer manner, or various combinations thereof. As such, the various devices may cooperate to determine intent of multi-modal natural language inputs, and commands, queries, or other requests may be routed to one or more of the devices best suited to take action in response thereto.

682 Citations

24 Claims

1. A method to provide an integrated, multi-modal, natural language voice services environment having an input device, a central device, and one or more secondary devices, wherein the method comprises:
- receiving, at the central device, a multi-modal natural language input from the input device, wherein the input device initially received the multi-modal natural language input;
  
  maintaining, on the input device, the central device, and the one or more secondary devices, a constellation model that describes natural language resources, dynamic states, and intent determination capabilities associated with the input device, the central device, and the one or more secondary devices;
  
  aggregating the natural language resources, the dynamic states, and the intent determination capabilities associated with the input device and the one or more secondary devices on the central device to converge the natural language resources, the dynamic states, and the intent determination capabilities held across the natural language voice services environment on the central device;
  
  determining, on the central device, a preliminary intent associated with the multi-modal natural language input using the converged natural language resources, dynamic states, and intent determination capabilities held across the natural language voice services environment;
  
  sending the multi-modal natural language input from the central device to the one or more secondary devices to invoke the intent determination capabilities associated with the one or more secondary devices;
  
  collating, at the central device, intent determination responses received from the one or more secondary devices with the preliminary intent determined on the central device to generate an intent hypothesis associated with the multi-modal natural language input on the central device; and
  
  returning the intent hypothesis associated with the multi-modal natural language input and information relating to one or more requests associated with the multi-modal natural language input to the input device, wherein the input device invokes one or more actions based on the returned intent hypothesis and the information relating to one or more requests associated with the multi-modal natural language input.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein the intent determination capabilities associated with the input device, the central device, and the one or more secondary devices include local processing power, local storage resources, and local natural language processing capabilities.
  - 3. The method of claim 1, wherein collating the intent determination responses includes:
    - receiving the intent determination responses from the one or more secondary devices in an interleaved manner; and
      
      arbitrating among the interleaved intent determination responses received from the one or more secondary devices and the preliminary intent determined on the central device to generate the intent hypothesis associated with the multi-modal natural language input.
  - 4. The method of claim 3, wherein the generated intent hypothesis comprises one of the interleaved intent determination responses received from the one or more secondary devices or the preliminary intent determined on the central device having a highest confidence level.
  - 5. The method of claim 3, wherein arbitrating among the interleaved intent determination responses and the preliminary intent includes:
    - evaluating, at the central device, the constellation model to determine whether the intent determination capabilities associated with any of the one or more secondary devices include multi-pass speech recognition; and
      
      assigning a higher weight to confidence levels associated with any of the interleaved intent determination responses that were generated using multi-pass speech recognition.
  - 6. The method of claim 3, wherein collating the intent determination responses further includes terminating the collating in response to determining that a predetermined amount of time has lapsed, a predetermined amount of resources have been consumed, or one or more of the interleaved intent determination responses received from the one or more secondary devices meets or exceeds an acceptable confidence level.
  - 7. The method of claim 6, wherein the input device that initially received the multi-modal natural language input communicates the multi-modal natural language input to the central device in response to an initial intent determination generated on the input device failing to meet or exceed the acceptable confidence level.
  - 8. The method of claim 1, wherein the natural language resources and the dynamic states associated with the input device, the central device, and the one or more secondary devices include local vocabularies, local vocabulary translation mechanisms, local misrecognitions, local context information, local short-term shared knowledge, local long-term shared knowledge.
  - 9. The method of claim 1, further comprising operating the natural language voice services environment in a continuous listening mode that causes the input device to initially accept the multi-modal natural language input in response to determining that one or more predetermined events have occurred.
  - 10. The method of claim 1, further comprising identifying, at the central device, one or more domains relevant to the multi-modal natural language input, wherein the central device sends the multi-modal language input to the one or more secondary devices in response to determining that the intent determination capabilities associated therewith have relevance to the one or more identified domains.
  - 11. The method of claim 1, wherein the information returned to the input device includes results associated with the central device resolving the one or more requests and the one or more actions that the input device invokes include presenting the results in response to the multi-modal natural language input.
  - 12. The method of claim 1, wherein the information returned to the input device includes one or more queries or commands formulated on the central device and the one or more actions that the input device invokes include routing the queries or commands to generate results to present in response to the multi-modal natural language input.

13. A system to provide an integrated, multi-modal, natural language voice services environment having an input device, one or more secondary devices, and a central device configured to:
- receive a multi-modal natural language input from the input device, wherein the input device initially received the multi-modal natural language input;
  
  maintain a constellation model and distribute the constellation model to the input device and the one or more secondary devices, wherein the constellation model describes natural language resources, dynamic states, and intent determination capabilities associated with the input device, the central device, and the one or more secondary devices;
  
  aggregate the natural language resources, the dynamic states, and the intent determination capabilities associated with the input device and the one or more secondary devices to converge the natural language resources, the dynamic states, and the intent determination capabilities held across the natural language voice services environment;
  
  use the converged natural language resources, dynamic states, and intent determination capabilities held across the natural language voice services environment to determine a preliminary intent associated with the multi-modal natural language input;
  
  send the multi-modal natural language input to the one or more secondary devices to invoke the intent determination capabilities associated with the one or more secondary devices;
  
  collate intent determination responses received from the one or more secondary devices with the determined preliminary intent to generate an intent hypothesis associated with the multi-modal natural language input on the central device; and
  
  return the intent hypothesis associated with the multi-modal natural language input and information relating to one or more requests associated with the multi-modal natural language input to the input device, wherein the input device is configured to invoke one or more actions based on the returned intent hypothesis and the information relating to one or more requests associated with the multi-modal natural language input.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 14. The system of claim 13, wherein the intent determination capabilities associated with the input device, the central device, and the one or more secondary devices include local processing power, local storage resources, and local natural language processing capabilities.
  - 15. The system of claim 13, wherein to collate the intent determination responses, the central device is further configured to:
    - receive the intent determination responses from the one or more secondary devices in an interleaved manner; and
      
      arbitrate among the interleaved intent determination responses received from the one or more secondary devices and the determined preliminary intent to generate the intent hypothesis associated with the multi-modal natural language input.
  - 16. The system of claim 15, wherein the generated intent hypothesis comprises one of interleaved intent determination responses received from the one or more secondary devices or the preliminary intent determined on the central device having a highest confidence level.
  - 17. The system of claim 15, wherein to arbitrate among the interleaved intent determination responses and the preliminary intent, the central device is further configured to:
    - evaluate the constellation model to determine whether the intent determination capabilities associated with any of the one or more secondary devices include multi-pass speech recognition; and
      
      assign a higher weight to confidence levels associated with any of the interleaved intent determination responses that were generated using multi-pass speech recognition.
  - 18. The system of claim 15, wherein to collate the intent determination responses, the central device is further configured to terminate receiving the interleaved intent determination responses in response to a predetermined amount of time having lapsed, a predetermined amount of resources having been consumed, or one or more of the received interleaved intent determination responses meeting or exceeding an acceptable confidence level.
  - 19. The system of claim 18, wherein the input device that initially received the multi-modal natural language input is configured to communicate the multi-modal natural language input to the central device in response to an initial intent determination generated on the input device failing to meet or exceed the acceptable confidence level.
  - 20. The system of claim 13, wherein the natural language resources and the dynamic states associated with the input device, the central device, and the one or more secondary devices include local vocabularies, local vocabulary translation mechanisms, local misrecognitions, local context information, local short-term shared knowledge, local long-term shared knowledge.
  - 21. The system of claim 13, wherein the central device is further configured to operate the natural language voice services environment in a continuous listening mode that causes the input device to initially accept the multi-modal natural language input in response to determining that one or more predetermined events have occurred.
  - 22. The system of claim 13, wherein the central device is further configured to identify one or more domains relevant to the multi-modal natural language input and send the multi-modal language input to the one or more secondary devices in response to the intent determination capabilities associated therewith having relevance to the one or more identified domains.
  - 23. The system of claim 13, wherein the information returned to the input device includes results associated with the central device resolving the one or more requests and the one or more actions invoked on the input device include presenting the results in response to the multi-modal natural language input.
  - 24. The system of claim 13, wherein the information returned to the input device includes one or more queries or commands that the central device and the one or more actions invoked on the input device include routing the queries or commands to generate results to present in response to the multi-modal natural language input.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle International Corporation (Oracle Corporation)
Original Assignee
VoiceBox Technologies, Inc. (Microsoft Corporation)
Inventors
Kennewick, Robert A., Weider, Chris
Primary Examiner(s)
PULLIAS, JESSE SCOTT

Application Number

US12/127,343
Publication Number

US 20090299745A1
Time in Patent Office

2,002 Days
Field of Search

704/251, 704/252
US Class Current

704/252
CPC Class Codes

G06F 3/16   Sound input; Sound output s...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

System and method for an integrated, multi-modal, multi-device natural language voice services environment

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

682 Citations

24 Claims

Specification

Use Cases

Quick Links

Others

System and method for an integrated, multi-modal, multi-device natural language voice services environment

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

682 Citations

24 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others