SYSTEM AND METHOD FOR AN INTEGRATED, MULTI-MODAL, MULTI-DEVICE NATURAL LANGUAGE VOICE SERVICES ENVIRONMENT

US 20150142447A1
Filed: 11/18/2013
Published: 05/21/2015
Est. Priority Date: 05/27/2008
Status: Active Grant

First Claim

Patent Images

1-27. -27. (canceled)

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for an integrated, multi-modal, multi-device natural language voice services environment may be provided. In particular, the environment may include a plurality of voice-enabled devices each having intent determination capabilities for processing multi-modal natural language inputs in addition to knowledge of the intent determination capabilities of other devices in the environment. Further, the environment may be arranged in a centralized manner, a distributed peer-to-peer manner, or various combinations thereof. As such, the various devices may cooperate to determine intent of multi-modal natural language inputs, and commands, queries, or other requests may be routed to one or more of the devices best suited to take action in response thereto.

286 Citations

54 Claims

1-27. -27. (canceled)

28. A method of processing natural language utterances, the method being implemented by a first device that comprises one or more physical processors executing one or more computer program instructions which, when executed, perform the method, the method comprising:
- receiving, by the first device, a natural language utterance spoken by a user;
  
  performing, by the first device, speech recognition to determine one or more words of the natural language utterance;
  
  determining, by the first device, based on the one or more words, a first prediction of an intent of the user;
  
  transmitting, by the first device, the natural language utterance to a second device;
  
  receiving, from the second device by the first device, a second prediction of the intent of the user; and
  
  determining, by the first device, the intent of the user based on the first prediction and the second prediction.
- View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41)
- - 29. The method of claim 28, further comprising:
    - determining, by the first device, processing capabilities associated with a plurality of devices, wherein the plurality of devices comprises the second device; and
      
      selecting, by the first device, based on the processing capabilities associated with the second device, the second device to determine the second prediction,wherein transmitting the natural language utterance comprises transmitting the natural language utterance to the second device based on the selection.
  - 30. The method of claim 29, wherein the processing capabilities associated with the plurality of devices comprise processing power, storage resources, information, or services available at individual ones of the plurality of devices.
  - 31. The method of claim 29, further comprising:
    - determining, by the first device, based on the one or more words, a domain relating to the natural language utterance,wherein selecting the second device comprises selecting the second device based on the domain.
  - 32. The method of claim 31, wherein the plurality of devices are associated with different domains, the second device is associated with the domain, and the different domains comprise the domain.
  - 33. The method of claim 28, further comprising:
    - determining, by the first device, whether a prediction of the intent of the user is to be obtained from at least one other device,wherein transmitting the natural language utterance comprises transmitting the natural language utterance to the second device based on a determination that a prediction of the intent of the user is to be obtained from at least one other device.
  - 34. The method of claim 33, further comprising:
    - determining, by the first device, a confidence level associated with the first prediction; and
      
      determining, by the first device, whether the confident level satisfies a threshold level of confidence relating to intent prediction accuracy,wherein the determination that a prediction of the intent of the user is to be obtained from at least one other device is based on a determination that the confidence level does not satisfy the threshold level of confidence.
  - 35. The method of claim 28, further comprising:
    - transmitting, by the first device, the natural language utterance to a third device;
      
      receiving, from the third device by the first device, a third prediction of the intent of the user;
      
      receiving, from the second device by the first device, information regarding a second confidence level associated with the second prediction;
      
      receiving, from the third device by the first device, information regarding a third confidence level associated with the third prediction;
      
      comparing, by the first device, the second and third confidence levels with one another,wherein determining the intent of the user comprises determining the intent based on the comparison.
  - 36. The method of claim 35, further comprising:
    - determining, by the first device, processing capabilities associated with the second device and processing capabilities associated with the third device,wherein determining the intent comprises determining the intent based on the comparison and the processing capabilities associated with the second and third devices.
  - 37. The method of claim 28, further comprising:
    - determining, by the first device, a first confidence level associated with the first prediction;
      
      receiving, from the second device by the first device, information regarding a second confidence level associated with the second prediction;
      
      determining, by the first device, a highest confidence level among the first confidence level, the second confidence level, and one or more other confidence levels associated with one or more other predictions of the intent of the user, wherein the one or more other predictions are received from one or more other devices;
      
      wherein determining the intent comprises selecting, from the first prediction, the second prediction, and the one or more other predictions, a prediction of the intent of the user that is associated with the highest confidence level.
  - 38. The method of claim 28, wherein the first device is designated as a central device.
  - 39. The method of claim 28, wherein the first device comprises an input device at which the natural language utterance is initially received from the user.
  - 40. The method of claim 39, further comprising:
    - determining, by the first device, whether a third device is available to manage determination of the intent of the user,wherein transmitting the natural language utterance comprises transmitting the natural language utterance to the second device based on a determination that the third device is not available to manage determination of the intent of the user.
  - 41. The method of claim 40, further comprising:
    - transmitting, by the first device, a request to a plurality of devices for information regarding processing capabilities associated with the plurality of devices based on a determination that the third device is not available to manage determination of the intent, wherein the plurality of devices comprise the second device;
      
      receiving, by the first device, information regarding the processing capabilities; and
      
      selecting, by the first device based on the processing capabilities, the second device to determine the second prediction,wherein transmitting the natural language utterance comprises transmitting the natural language utterance to the second device based on the selection.

42. A system for processing natural language utterances, the system comprising:
- a first device having one or more physical processors programmed to execute one or more computer program instructions which, when executed, cause the one or more physical processors to;
  
  receive a natural language utterance spoken by a user;
  
  perform speech recognition to determine one or more words of the natural language utterance;
  
  determine, based on the one or more words, a first prediction of an intent of the user;
  
  transmit the natural language utterance to a second device;
  
  receive, from the second device, a second prediction of the intent of the user; and
  
  determine the intent of the user based on the first prediction and the second prediction.
- View Dependent Claims (43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54)
- - 43. The system of claim 42, wherein the one or more physical processors are further caused to:
    - determine processing capabilities associated with a plurality of devices, wherein the plurality of devices comprises the second device; and
      
      select, based on the processing capabilities associated with the second device, the second device to determine the second prediction,wherein transmitting the natural language utterance comprises transmitting the natural language utterance to the second device based on the selection.
  - 44. The system of claim 43, wherein the processing capabilities associated with the plurality of devices comprise processing power, storage resources, information, or services available at individual ones of the plurality of devices.
  - 45. The system of claim 43, wherein the one or more physical processors are further caused to:
    - determine, based on the one or more words, a domain relating to the natural language utterance,wherein selecting the second device comprises selecting the second device based on the domain.
  - 46. The system of claim 45, wherein the plurality of devices are associated with different domains, the second device is associated with the domain, and the different domains comprise the domain.
  - 47. The system of claim 42, wherein the one or more physical processors are further caused to:
    - determine whether a prediction of the intent of the user is to be obtained from at least one other device,wherein transmitting the natural language utterance comprises transmitting the natural language utterance to the second device based on a determination that a prediction of the intent of the user is to be obtained from at least one other device.
  - 48. The system of claim 47, wherein the one or more physical processors are further caused to:
    - determine a confidence level associated with the first prediction; and
      
      determine whether the confident level satisfies a threshold level of confidence relating to intent prediction accuracy,wherein the determination that a prediction of the intent of the user is to be obtained from at least one other device is based on a determination that the confidence level does not satisfy the threshold level of confidence.
  - 49. The system of claim 42, wherein the one or more physical processors are further caused to:
    - transmit the natural language utterance to a third device;
      
      receive, from the third device, a third prediction of the intent of the user;
      
      receive, from the second device, information regarding a second confidence level associated with the second prediction;
      
      receive, from the third device, information regarding a third confidence level associated with the third prediction;
      
      compare, by the first device, the second and third confidence levels with one another,wherein determining the intent of the user comprises determining the intent based on the comparison.
  - 50. The system of claim 49, wherein the one or more physical processors are further caused to:
    - determine processing capabilities associated with the second device and processing capabilities associated with the third device,wherein determining the intent comprises determining the intent based on the comparison and the processing capabilities associated with the second and third devices.
  - 51. The system of claim 42, wherein the one or more physical processors are further caused to:
    - determine a first confidence level associated with the first prediction;
      
      receive, from the second device, information regarding a second confidence level associated with the second prediction;
      
      determine a highest confidence level among the first confidence level, the second confidence level, and one or more other confidence levels associated with one or more other predictions of the intent of the user, wherein the one or more other predictions are received from one or more other devices;
      
      wherein determining the intent comprises selecting, from the first prediction, the second prediction, and the one or more other predictions, a prediction of the intent of the user that is associated with the highest confidence level.
  - 52. The system of claim 42, wherein the first device is designated as a central device.
  - 53. The system of claim 42, wherein the first device comprises an input device at which the natural language utterance is initially received from the user.
  - 54. The system of claim 53, wherein the one or more physical processors are further caused to:
    - determine whether a third device is available to manage determination of the intent of the user;
      
      transmit a request to a plurality of devices for information regarding processing capabilities associated with the plurality of devices based on a determination that the third device is not available to manage determination of the intent, wherein the plurality of devices comprise the second device;
      
      receive information regarding the processing capabilities; and
      
      select, based on the processing capabilities, the second device to determine the second prediction,wherein transmitting the natural language utterance comprises transmitting the natural language utterance to the second device based on the selection.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle International Corporation (Oracle Corporation)
Original Assignee
VoiceBox Technologies Corporation (Microsoft Corporation)
Inventors
Kennewick, Robert A., Weider, Chris

Granted Patent

US 9,305,548 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/275
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G06F 40/40   Processing or translation o...

G10L 15/1822   Parsing for meaning underst...

G10L 15/22   Procedures used during a sp...

G10L 15/24   Speech recognition using no...

G10L 15/26   Speech to text systems G10L...

G10L 15/285   Memory allocation or algori...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

G10L 2015/223   Execution procedure of a sp...

SYSTEM AND METHOD FOR AN INTEGRATED, MULTI-MODAL, MULTI-DEVICE NATURAL LANGUAGE VOICE SERVICES ENVIRONMENT

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

286 Citations

54 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM AND METHOD FOR AN INTEGRATED, MULTI-MODAL, MULTI-DEVICE NATURAL LANGUAGE VOICE SERVICES ENVIRONMENT

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

286 Citations

54 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links