MOBILE SYSTEMS AND METHODS OF SUPPORTING NATURAL LANGUAGE HUMAN-MACHINE INTERACTIONS

US 20110231182A1
Filed: 04/11/2011
Published: 09/22/2011
Est. Priority Date: 08/29/2005
Status: Active Grant

First Claim

Patent Images

1. A mobile device for processing multi-modal natural language inputs, comprising:

a conversational voice user interface that receives a multi-modal natural language input from a user, the multi-modal natural language input including a natural language utterance and a non-speech input, the conversational voice user interface coupled to a transcription module that transcribes the non-speech input to create a non-speech-based transcription;

a conversational speech analysis engine that identifies the user that provided the multi-modal natural language input, the conversational speech analysis engine using a speech recognition engine and a semantic knowledge-based model to create a speech-based transcription of the natural language utterance, wherein the semantic knowledge-based model includes a personalized cognitive model derived from one or more prior interactions between the identified user and the mobile device, a general cognitive model derived from one or more prior interactions between a plurality of users and the mobile device, and an environmental model derived from an environment of the identified user and the mobile device;

a merging module that merges the speech-based transcription and the non-speech-based transcription to create a merged transcription;

a knowledge-enhanced speech recognition engine that identifies one or more entries in a context stack matching information contained in the merged transcription and determines a most likely context for the multi-modal natural language input based on the identified entries; and

a response generating module that identifies a domain agent associated with the most likely context for the multi-modal input, communicates a request to the identified domain agent, and generates a response to the user from content provided by the identified domain agent as a result of processing the request.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A mobile system is provided that includes speech-based and non-speech-based interfaces for telematics applications. The mobile system identifies and uses context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for users that submit requests and/or commands in multiple domains. The invention creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. The invention may organize domain specific behavior and information into agents, that are distributable or updateable over a wide area network.

674 Citations

32 Claims

1. A mobile device for processing multi-modal natural language inputs, comprising:
- a conversational voice user interface that receives a multi-modal natural language input from a user, the multi-modal natural language input including a natural language utterance and a non-speech input, the conversational voice user interface coupled to a transcription module that transcribes the non-speech input to create a non-speech-based transcription;
  
  a conversational speech analysis engine that identifies the user that provided the multi-modal natural language input, the conversational speech analysis engine using a speech recognition engine and a semantic knowledge-based model to create a speech-based transcription of the natural language utterance, wherein the semantic knowledge-based model includes a personalized cognitive model derived from one or more prior interactions between the identified user and the mobile device, a general cognitive model derived from one or more prior interactions between a plurality of users and the mobile device, and an environmental model derived from an environment of the identified user and the mobile device;
  
  a merging module that merges the speech-based transcription and the non-speech-based transcription to create a merged transcription;
  
  a knowledge-enhanced speech recognition engine that identifies one or more entries in a context stack matching information contained in the merged transcription and determines a most likely context for the multi-modal natural language input based on the identified entries; and
  
  a response generating module that identifies a domain agent associated with the most likely context for the multi-modal input, communicates a request to the identified domain agent, and generates a response to the user from content provided by the identified domain agent as a result of processing the request.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The mobile device of claim 1, wherein the response includes an aggregation of content gathered when the identified domain agent processes the request.
  - 3. The mobile device of claim 1, wherein the conversational speech analysis engine supports interactions with the plurality of users during an overlapping session.
  - 4. The mobile device of claim 1, wherein the conversational speech analysis engine supports interactions with the plurality of users during an interleaved session.
  - 5. The mobile device of claim 4, wherein the mobile device processes queries in an order of receipt during the interleaved session.
  - 6. The mobile device of claim 4, wherein the mobile device processes queries in an order determined by a length of the queries during the interleaved session.
  - 7. The mobile device of claim 1, wherein the conversational speech analysis engine identifies the user based on at least one of voiceprint matching, password matching, or pass-phrase matching.
  - 8. The mobile device of claim 1, wherein the conversational voice user interface subsequently receives one or more follow-up multi-modal inputs, the follow-up multi-modal inputs including at least one of a follow-up natural language utterance or a follow-up non-speech input.
  - 9. The mobile device of claim 8, wherein the identified domain agent updates the context stack and the semantic knowledge-based model in response to processing the request.
  - 10. The mobile device of claim 9, wherein the knowledge-enhanced speech recognition engine determines a most likely context for the follow-up multi-modal input using the updated context stack.
  - 11. The mobile device of claim 9, wherein the conversational speech analysis engine creates a speech-based transcription of the follow-up natural language utterance using the updated semantic knowledge-based model.
  - 12. The mobile device of claim 1, wherein the identified domain agent processes the request by querying one or more local or network information sources.
  - 13. The mobile device of claim 12, wherein one or more of the local or network information sources includes an Internet browsing service.
  - 14. The mobile device of claim 1, wherein the identified domain agent processes the request by directing a command to one or more local or remote devices.

15. A system for processing multi-modal natural language inputs, comprising:
- a plurality of mobile devices that support multi-modal natural language interactions with a user;
  
  a context manager communicatively coupled to the plurality of mobile devices, wherein the context manager synchronizes a semantic knowledge-based model and a context stack among the plurality of mobile devices, wherein the semantic knowledge-based model includes a personalized cognitive model derived from one or more prior interactions between the user and one or more of the mobile devices, a general cognitive model derived from one or more prior interactions between a plurality of users and one or more of the mobile devices, and an environmental model derived from an environment of the user and one or more of the mobile devices;
  
  a conversational voice user interface communicatively coupled to one or more of the plurality of mobile devices, wherein the conversational voice user interface receives a multi-modal natural language input from the user that includes at least a natural language utterance;
  
  a conversational speech analysis engine that identifies the user that provided the multi-modal input, the conversational speech analysis engine using a speech recognition engine and the semantic knowledge-based model to create a speech-based transcription of the natural language utterance;
  
  a knowledge-enhanced speech recognition engine that identifies one or more entries in a context stack matching information contained in the speech-based transcription and determines a most likely context for the multi-modal natural language input based on the identified entries; and
  
  a response generating module that identifies a domain agent associated with the most likely context for the multi-modal input, communicates a request to the identified domain agent, and generates a response to the user from content provided by the identified domain agent as a result of processing the request.
- View Dependent Claims (16, 17, 18)
- - 16. The system of claim 15, wherein the plurality of mobile devices register with the context manager and subscribe to one or more events that the context manager broadcasts.
  - 17. The system of claim 16, wherein the context module receives an input from one or more of the mobile devices that updates at least one of the semantic knowledge-based model or the context stack.
  - 18. The system of claim 17, wherein the context module broadcasts an event relating to the received input to the subscribed mobile devices.

19. A method for processing multi-modal natural language inputs, comprising:
- receiving a multi-modal natural language input at a conversational voice user interface, the multi-modal input including a natural language utterance and a non-speech input provided by a user, wherein a transcription module coupled to the conversational voice user interface transcribes the non-speech input to create a non-speech-based transcription;
  
  identifying the user that provided the multi-modal input;
  
  creating a speech-based transcription of the natural language utterance using a speech recognition engine and a semantic knowledge-based model, wherein the semantic knowledge-based model includes a personalized cognitive model derived from one or more prior interactions between the identified user and the conversational voice user interface, a general cognitive model derived from one or more prior interactions between a plurality of users and the conversational voice user interface, and an environmental model derived from an environment of the identified user and the conversational voice user interface;
  
  merging the speech-based transcription and the non-speech-based transcription to create a merged transcription;
  
  identifying one or more entries in a context stack matching information contained in the merged transcription;
  
  determining a most likely context for the multi-modal input based on the identified entries;
  
  identifying a domain agent associated with the most likely context for the multi-modal input;
  
  communicating a request to the identified domain agent; and
  
  generating a response to the user from content provided by the identified domain agent as a result of processing the request.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
- - 20. The method of claim 19, wherein the generated response includes an aggregation of content gathered when the identified domain agent processes the request.
  - 21. The method of claim 19, wherein the conversational voice user interface supports interactions with the plurality of users during an overlapping session.
  - 22. The method of claim 19, wherein the conversational voice user interface supports interactions with the plurality of users during an interleaved session.
  - 23. The method of claim 22, wherein queries are processed in an order of receipt during the interleaved session.
  - 24. The method of claim 23, wherein queries are processed in an order determined by a length of the queries during the interleaved session.
  - 25. The method of claim 19, further comprising verifying an identity of the user based on voiceprint matching, password matching, or pass-phrase matching.
  - 26. The method of claim 19, further comprising receiving one or more follow-up multi-modal inputs at the conversational voice user interface, the follow-up multi-modal inputs including at least one of a follow-up natural language utterance or a follow-up non-speech input.
  - 27. The method of claim 26, wherein the identified domain agent updates the context stack and the semantic knowledge-based model in response to processing the request.
  - 28. The method of claim 27, wherein the knowledge-enhanced speech recognition engine determines a most likely context for the follow-up multi-modal input using the updated context stack.
  - 29. The method of claim 27, wherein the speech recognition engine creates a speech-based transcription of the follow-up natural language utterance using the updated semantic knowledge-based model.
  - 30. The method of claim 19, wherein the identified domain agent processes the request by querying one or more local or network information sources.
  - 31. The method of claim 30, wherein one or more of the local or network information sources includes an Internet browsing service.
  - 32. The method of claim 19, wherein the identified domain agent processes the request by directing a command to one or more local or remote devices.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dialect, LLC
Original Assignee
VoiceBox Technologies, Inc. (Microsoft Corporation)
Inventors
Armstrong, Lynn Elise, Di Cristo, Philippe, Kennewick, Robert A., Weider, Chris, Kennewick, Mike, Menaker, Samuel, Kennewick, Richard

Granted Patent

US 8,195,468 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/9
CPC Class Codes

G06F 16/951   Indexing; Web crawling tech...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/22   Procedures used during a sp...

G10L 2015/223   Execution procedure of a sp...

G10L 2015/227   of the speaker; Human-fact...

G10L 2015/228   of application context

G10L 21/06   Transformation of speech in...

H04M 2250/74   with voice recognition mean...

MOBILE SYSTEMS AND METHODS OF SUPPORTING NATURAL LANGUAGE HUMAN-MACHINE INTERACTIONS

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

674 Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

MOBILE SYSTEMS AND METHODS OF SUPPORTING NATURAL LANGUAGE HUMAN-MACHINE INTERACTIONS

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

674 Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links