Mobile systems and methods of supporting natural language human-machine interactions

US 7,949,529 B2
Filed: 08/29/2005
Issued: 05/24/2011
Est. Priority Date: 08/29/2005
Status: Active Grant

First Claim

Patent Images

1. A mobile device for annotating objects using multi-modal natural language inputs, comprising:

an interface configured to communicate with a location service to determine location information associated with an object accessible to the mobile device;

a message service configured to communicate the location information to a storage device configured to store the location information with the object;

one or more input devices configured to receive a multi-modal natural language input, wherein the multi-modal natural language input includes a natural language utterance that annotates the object;

a speech recognition engine configured to transcribe the natural language utterance into a textual annotation using a dynamic grammar;

an agent architecture configured to search the storage device with one or more semantic attributes extracted from the natural language utterance and retrieve the object from the storage device in response to the extracted semantic attributes matching metadata associated with the location information stored with the object in the storage device; and

a processing unit configured to label the object retrieved from the storage device with the textual annotation to post-process the object with the textual annotation, wherein the storage device is further configured to store the textual annotation with the annotated object to post-process the annotated object.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A mobile system is provided that includes speech-based and non-speech-based interfaces for telematics applications. The mobile system identifies and uses context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for users that submit requests and/or commands in multiple domains. The invention creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. The invention may organize domain specific behavior and information into agents, that are distributable or updateable over a wide area network.

1099 Citations

31 Claims

1. A mobile device for annotating objects using multi-modal natural language inputs, comprising:
- an interface configured to communicate with a location service to determine location information associated with an object accessible to the mobile device;
  
  a message service configured to communicate the location information to a storage device configured to store the location information with the object;
  
  one or more input devices configured to receive a multi-modal natural language input, wherein the multi-modal natural language input includes a natural language utterance that annotates the object;
  
  a speech recognition engine configured to transcribe the natural language utterance into a textual annotation using a dynamic grammar;
  
  an agent architecture configured to search the storage device with one or more semantic attributes extracted from the natural language utterance and retrieve the object from the storage device in response to the extracted semantic attributes matching metadata associated with the location information stored with the object in the storage device; and
  
  a processing unit configured to label the object retrieved from the storage device with the textual annotation to post-process the object with the textual annotation, wherein the storage device is further configured to store the textual annotation with the annotated object to post-process the annotated object.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The mobile device of claim 1 wherein the object includes one or more of a digital photograph, a calendar entry, an email message, an instant message, a phonebook entry, a voicemail entry, a digital movie, or a digital media file.
  - 3. The mobile device of claim 1 wherein the storage device includes one or more of a local memory associated with the mobile device or a server, a shared workspace, an object storage and retrieval facility, or a data center located remotely from the mobile device.
  - 4. The mobile device of claim 1 wherein the textual annotation provides searchable metadata that the storage device is further configured to store with the annotated object to post-process the annotated object.
  - 5. The mobile device of claim 1 wherein the message service is further configured to communicate a verbal annotation created from the natural language utterance to the storage device, and wherein the storage device is further configured to store the verbal annotation with the annotated object to post-process the annotated object.
  - 6. The mobile device of claim 1 wherein the agent architecture is further configured to search the storage device with non-speech input information extracted from the multi-modal natural language input and retrieve the object from the storage device in response to the extracted non-speech input information matching the metadata associated with the location information stored with the object in the storage device.
  - 7. The mobile device of claim 1 wherein the dynamic grammar includes a plurality of entries in one or more dictionary and phrase tables that are dynamically updated based on a history of a current dialog and one or more prior dialogs.
  - 8. The mobile device of claim 7, further comprising a misrecognition engine configured to dynamically update the plurality of entries in the dynamic grammar in response to one or more misrecognized or unrecognized events in one or more of the current dialog or the prior dialogs.
  - 9. The mobile device of claim 8, wherein the misrecognition engine is configured to dynamically update the plurality of entries in the dynamic grammar to include one or more decoy words for out-of-vocabulary words.
  - 10. The mobile device of claim 1, further comprising a parser configured to interpret one or more words or phrases recognized in the natural language utterance to extract the one or more semantic attributes from the natural language utterance.
  - 11. The mobile device of claim 10, wherein the extracted semantic attributes comprise global positioning system coordinates determined from the interpreted words or phrases.
  - 12. The mobile device of claim 10, wherein the speech recognition engine is further configured to transcribe the natural language utterance into the one or more recognized words or phrases using the dynamic grammar.
  - 13. The mobile device of claim 10, wherein the speech recognition engine is further configured to use one or more expected contexts stored in a context stack to recognize the one or more words or phrases and transcribe the natural language utterance into the textual annotation, and wherein the parser is further configured to use the one or more expected contexts stored in the context stack to interpret the recognized words or phrases and extract the one or more semantic attributes from the natural language utterance.
  - 14. The mobile device of claim 13, wherein the speech recognition engine is further configured to determine a most likely context for the natural language utterance from the one or more expected contexts stored in the context stack, and wherein the speech recognition engine is further configured to use the determined most likely context to recognize the one or more words or phrases and transcribe the natural language utterance into the textual annotation.
  - 15. The mobile device of claim 13, wherein the parser is further configured to determine a most likely context for the natural language utterance from the one or more expected contexts stored in the context stack, and wherein the parser is further configured to use the determined most likely context to interpret the recognized words or phrases and extract the one or more semantic attributes from the natural language utterance.

16. A non-transitory computer-readable storage medium that stores computer-executable instructions for annotating objects using multi-modal natural language inputs, wherein executing the computer-executable instructions on one or more processors causes the one or more processors to:
- receive a multi-modal natural language input at one or more input devices coupled to the one or more processors, wherein the multi-modal natural language input includes a natural language utterance that annotates an object accessible to the one or more processors;
  
  transcribe the natural language utterance into a textual annotation with a speech recognition engine coupled to the one or more processors, wherein the speech recognition engine uses a dynamic grammar to transcribe the natural language utterance into the textual annotation;
  
  communicate the textual annotation to a storage device with a message service coupled to the one or more processors, wherein the storage device stores the textual annotation with the annotated object;
  
  communicatively couple, in an agent architecture associated with the one or more processors, services associated with an agent manager, a system agent, a plurality of domain agents, and an agent library that includes one or more utilities that the system agent and the plurality of domain agents can use;
  
  use, by the agent architecture, the communicatively coupled services to search the storage device with one or more semantic attributes extracted from a subsequent natural language utterance; and
  
  use, by the agent architecture, the communicatively coupled services to retrieve the annotated object from the storage device in response to the extracted semantic attributes matching metadata stored with the textual annotation in the storage device.

17. A mobile device for annotating objects using multi-modal natural language inputs, comprising:
- one or more input devices configured to receive a multi-modal natural language input, wherein the multi-modal natural language input includes a natural language utterance that annotates an object accessible to the mobile device;
  
  a speech recognition engine configured to transcribe the natural language utterance into a textual annotation using a dynamic grammar;
  
  a message service configured to communicate the textual annotation to a storage device configured to store the textual annotation with the annotated object; and
  
  an agent architecture configured to;
  
  communicatively couple services associated with an agent manager, a system agent, a plurality of domain agents, and an agent library that includes one or more utilities that the system agent and the plurality of domain agents can use;
  
  use the communicatively coupled services to search the storage device with one or more semantic attributes extracted from a subsequent natural language utterances; and
  
  use the communicatively coupled services to retrieve the annotated object from the storage device in response to the extracted semantic attributes matching metadata stored with the textual annotation in the storage device.

18. A method for annotating objects using multi-modal natural language inputs, comprising:
- receiving a multi-modal natural language input at one or more input devices coupled to a mobile device, wherein the multi-modal natural language input includes a natural language utterance that annotates an object accessible to the mobile device;
  
  transcribing the natural language utterance into a textual annotation with a speech recognition engine coupled to the mobile device, wherein the speech recognition engine uses a dynamic grammar to transcribe the natural language utterance into the textual annotation;
  
  communicating the textual annotation to a storage device with a message service coupled to the mobile device, wherein the storage device stores the textual annotation with the annotated object;
  
  communicatively coupling, in an agent architecture associated with the mobile device, services associated with an agent manager, a system agent, a plurality of domain agents, and an agent library that includes one or more utilities that the system agent and the plurality of domain agents can use;
  
  using, by the agent architecture, the communicatively coupled services to search the storage device with one or more semantic attributes extracted from a subsequent natural language utterance; and
  
  using, by the agent architecture, the communicatively coupled services to retrieve the annotated object from the storage device in response to the extracted semantic attributes matching metadata stored with the textual annotation in the storage device.

19. A method for annotating objects using multi-modal natural language inputs, comprising:
- communicating with a location service to determine location information associated with an object accessible to a mobile device;
  
  communicating the location information to a storage device configured to store the location information with the object, wherein a message service coupled to the mobile device communicates the location information to the storage device;
  
  receiving a multi-modal natural language input at one or more input devices coupled to a mobile device, wherein the multi-modal natural language input includes a natural language utterance that annotates the object;
  
  transcribing the natural language utterance into a textual annotation with a speech recognition engine coupled to the mobile device, wherein the speech recognition engine uses a dynamic grammar to transcribe the natural language utterance into the textual annotation;
  
  searching, by an agent architecture coupled to the mobile device, the storage device with one or more semantic attributes extracted from the natural language utterance, wherein the agent architecture retrieves the object from the storage device in response to the extracted semantic attributes matching metadata associated with the location information stored with the object in the storage device; and
  
  labeling the object retrieved from the storage device with the textual annotation to post-process the object with the textual annotation, wherein the storage device further stores the textual annotation with the annotated object to post-process the annotated object.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28)
- - 20. The method of claim 19 wherein the object includes one or more of a digital photograph, a calendar entry, an email message, an instant message, a phonebook entry, a voicemail entry, a digital movie, or a digital media file.
  - 21. The method of claim 19 wherein the storage device includes one or more of a local memory associated with the mobile device or a server, a shared workspace, an object storage and retrieval facility, or a data center located remotely from the mobile device.
  - 22. The method of claim 19 wherein the textual annotation provides searchable metadata that the storage device further stores with the annotated object to post-process the annotated object.
  - 23. The method of claim 19, further comprising interpreting, with a parser coupled to the mobile device, one or more words or phrases recognized in the natural language utterance to extract the one or more semantic attributes from the natural language utterance.
  - 24. The method of claim 23, further comprising determining, with the parser, a most likely context for the subsequent natural language utterance from one or more expected contexts stored in a context stackwherein the parser uses the one or more expected contexts stored in the context stack and the determined most likely context to interpret the recognized words or phrases and extract the one or more semantic attributes from the natural language utterance.
  - 25. The method of claim 19, further comprisingcommunicating a verbal annotation created from the natural language utterance to the storage device, wherein the storage device further stores classifies the verbal annotation with the annotated object to post-process the annotated object.
  - 26. The method of claim 19, further comprising:
    - searching, by the agent architecture, the storage device with non-speech input information extracted from the multi-modal natural language input; and
      
      retrieving, by the agent architecture, the object from the storage device in response to the extracted non-speech input information matching the metadata associated with the location information stored with the object in the storage device.
  - 27. The method of claim 23 wherein the extracted semantic attributes comprise global positioning system coordinates determined from the interpreted words or phrases.
  - 28. The method of claim 23, further comprising determining, with the speech recognition engine, a most likely context for the natural language utterance from one or more expected contexts stored in a context stack, wherein the speech recognition engine uses the one or more expected contexts stored in the context stack and the determined most likely context to recognize the one or more words or phrases and transcribe the natural language utterance into the textual annotation.

29. A system for annotating objects using multi-modal natural language inputs, comprising:
- a storage device configured to store an object accessible to an electronic device;
  
  one or more input devices configured to receive a multi-modal natural language input, wherein the multi-modal natural language input includes a natural language utterance that annotates the object;
  
  a speech recognition engine configured to transcribe the natural language utterance into a textual annotation using a dynamic grammar;
  
  a message service configured to communicate the textual annotation to the storage device, wherein the storage device is further configured to store the textual annotation with the annotated object; and
  
  an agent architecture configured to;
  
  communicatively couple services associated with an agent manager, a system agent, a plurality of domain agents, and an agent library that includes one or more utilities that the system agent and the plurality of domain agents can use;
  
  use the communicatively coupled services to search the storage device with one or more semantic attributes extracted from a subsequent natural language utterances; and
  
  use the communicatively coupled services to retrieve the annotated object from the storage device in response to the extracted semantic attributes matching metadata stored with the textual annotation in the storage device.

30. A system for annotating objects using multi-modal natural language inputs, comprising:
- a storage device configured to store an object accessible to an electronic device;
  
  an interface configured to communicate with a location service to determine location information associated with the object;
  
  a message service configured to communicate the location information to the storage device, wherein the storage device is further configured to store the location information with the object;
  
  one or more input devices configured to receive a multi-modal natural language input, wherein the multi-modal natural language input includes a natural language utterance that annotates the object;
  
  a speech recognition engine configured to transcribe the natural language utterance into a textual annotation using a dynamic grammar;
  
  an agent architecture configured to search the storage device with one or more semantic attributes extracted from the natural language utterance and retrieve the object from the storage device in response to the extracted semantic attributes matching metadata associated with the location information stored with the object in the storage device; and
  
  a processing unit configured to label the object retrieved from the storage device with the textual annotation to post-process the object with the textual annotation, wherein the storage device is further configured to store the textual annotation with the annotated object to post-process the annotated object.

31. A non-transitory computer-readable storage medium that stores computer-executable instructions for annotating objects using multi-modal natural language inputs, wherein executing the computer-executable instructions on one or more processors causes the one or more processors to:
- communicate with a location service to determine location information associated with an object accessible to the one or more processors;
  
  communicate the location information to a storage device configured to store the location information with the object, wherein a message service coupled to the one or more processors communicates the location information to the storage device;
  
  receive a multi-modal natural language input at one or more input devices coupled to the one or more processors, wherein the multi-modal natural language input includes a natural language utterance that annotates the object;
  
  transcribe, the natural language utterance into a textual annotation with a speech recognition engine coupled to the one or more processors, wherein the speech recognition engine uses a dynamic grammar to transcribe the natural language utterance into the textual annotation;
  
  search, by an agent architecture coupled to the one or more processors, the storage device with one or more semantic attributes extracted from the natural language utterance, wherein the agent architecture retrieves the object from the storage device in response to the extracted semantic attributes matching metadata associated with the location information stored with the object in the storage device; and
  
  label the object retrieved from the storage device with a processing unit coupled to the one or more processors, wherein the processing unit labels the object with the textual annotation to post-process the object with the textual annotation, and wherein the storage device is further configured to store the textual annotation with the annotated object to post-process the annotated object.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dialect, LLC
Original Assignee
VoiceBox Technologies, Inc. (Microsoft Corporation)
Inventors
Menaker, Samuel, Kennewick, Richard, Armstrong, Lynn Elise, Di Cristo, Philippe, Kennewick, Robert A., Weider, Chris, Kennewick, Mike
Primary Examiner(s)
YEN, ERIC L

Application Number

US11/212,693
Publication Number

US 20070050191A1
Time in Patent Office

2,094 Days
Field of Search

704/235, 704/251, 704/270, 715727-728
US Class Current

704/270
CPC Class Codes

G06F 16/951   Indexing; Web crawling tech...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/22   Procedures used during a sp...

G10L 2015/223   Execution procedure of a sp...

G10L 2015/227   of the speaker; Human-fact...

G10L 2015/228   of application context

G10L 21/06   Transformation of speech in...

H04M 2250/74   with voice recognition mean...

Mobile systems and methods of supporting natural language human-machine interactions

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

1099 Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

Mobile systems and methods of supporting natural language human-machine interactions

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

1099 Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links