System and method for providing a natural language voice user interface in an integrated voice navigation services environment

US 8,140,335 B2
Filed: 12/11/2007
Issued: 03/20/2012
Est. Priority Date: 12/11/2007
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for providing a natural language voice user interface in an integrated voice navigation services environment, comprising:

receiving a natural language utterance at an input device coupled to a navigation device, wherein the natural language utterance relates to a navigation context;

generating one or more preliminary interpretations of the natural language utterance using a speech recognition engine associated with the navigation device, wherein generating the one or more preliminary interpretations of the natural language utterance includes;

recognizing one or more words in the natural language utterance that define a command in the navigation context, wherein the speech recognition engine includes a multi-pass speech recognition module that recognizes the one or more words that define the command in the navigation context;

recognizing, at the multi-pass speech recognition module, one or more additional words in the natural language utterance that define a location associated with the command in the navigation context; and

generating, at the multi-pass speech recognition module, a dynamic recognition grammar based on the location defined in the one or more additional words recognized in the natural language utterance, wherein the speech recognition engine uses the dynamic recognition grammar to generate the one or more preliminary interpretations of the natural language utterance;

analyzing, with a conversational language processor on the navigation device, the one or more preliminary interpretations using shared knowledge and information associated with the navigation context to determine a probable interpretation of the natural language utterance in the navigation context; and

executing, on the navigation device, a navigation agent associated with the navigation context to process the probable interpretation of the natural language utterance, wherein executing the navigation agent to process the probable interpretation of the natural language utterance includes;

identifying, by the navigation agent executing on the navigation device, one or more requests in the natural language utterance that relate to the navigation context from the probable interpretation of the natural language utterance; and

resolving, by the navigation agent executing on the navigation device, the one or more requests using information associated with a plurality of information sources, which include at least a navigation-specific information source.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A conversational, natural language voice user interface may provide an integrated voice navigation services environment. The voice user interface may enable a user to make natural language requests relating to various navigation services, and further, may interact with the user in a cooperative, conversational dialogue to resolve the requests. Through dynamic awareness of context, available sources of information, domain knowledge, user behavior and preferences, and external systems and devices, among other things, the voice user interface may provide an integrated environment in which the user can speak conversationally, using natural language, to issue queries, commands, or other requests relating to the navigation services provided in the environment.

908 Citations

50 Claims

1. A computer-implemented method for providing a natural language voice user interface in an integrated voice navigation services environment, comprising:
- receiving a natural language utterance at an input device coupled to a navigation device, wherein the natural language utterance relates to a navigation context;
  
  generating one or more preliminary interpretations of the natural language utterance using a speech recognition engine associated with the navigation device, wherein generating the one or more preliminary interpretations of the natural language utterance includes;
  
  recognizing one or more words in the natural language utterance that define a command in the navigation context, wherein the speech recognition engine includes a multi-pass speech recognition module that recognizes the one or more words that define the command in the navigation context;
  
  recognizing, at the multi-pass speech recognition module, one or more additional words in the natural language utterance that define a location associated with the command in the navigation context; and
  
  generating, at the multi-pass speech recognition module, a dynamic recognition grammar based on the location defined in the one or more additional words recognized in the natural language utterance, wherein the speech recognition engine uses the dynamic recognition grammar to generate the one or more preliminary interpretations of the natural language utterance;
  
  analyzing, with a conversational language processor on the navigation device, the one or more preliminary interpretations using shared knowledge and information associated with the navigation context to determine a probable interpretation of the natural language utterance in the navigation context; and
  
  executing, on the navigation device, a navigation agent associated with the navigation context to process the probable interpretation of the natural language utterance, wherein executing the navigation agent to process the probable interpretation of the natural language utterance includes;
  
  identifying, by the navigation agent executing on the navigation device, one or more requests in the natural language utterance that relate to the navigation context from the probable interpretation of the natural language utterance; and
  
  resolving, by the navigation agent executing on the navigation device, the one or more requests using information associated with a plurality of information sources, which include at least a navigation-specific information source.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein resolving the one or more requests using the information associated with the plurality of information sources includes:
    - determining, by the navigation agent executing on the navigation device, that the one or more requests in the natural language utterance approximate requested information; and
      
      causing the conversational language processor to manage a dialogue that includes one or more subsequent interactions to successively refine and resolve the requested information approximated in the one or more requests.
  - 3. The method of claim 2, wherein the one or more subsequent interactions in the dialogue include one or more output prompts and one or more subsequent multi-modal inputs to successively refine and resolve the requested information approximated in the one or more requests.
  - 4. The method of claim 1, wherein resolving the one or more requests using the information associated with the plurality of information sources includes:
    - determining, by the navigation agent executing on the navigation device, that the one or more requests identified in the natural language utterance include a navigation request to calculate a route to a full or partial address;
      
      calculating the route from a current location associated with the navigation device to a destination having an address that best corresponds to the full or partial address; and
      
      generating dynamic directions to provide a guide on the route from the current location associated with the navigation device to the destination associated with the route, wherein the navigation agent drives the dynamic directions using the information associated with the plurality of information sources and the current location associated with the navigation device.
  - 5. The method of claim 4, wherein the dynamic directions include information about one or more destinations, one or more points of interest, traffic, parking, weather, or one or more events relevant to the route and the current location associated with the navigation device.
  - 6. The method of claim 4, further comprising:
    - receiving a multi-modal input at the input device coupled to the navigation device subsequent to calculating the route, wherein the multi-modal includes a subsequent request; and
      
      invoking one or more domain agents to resolve the subsequent request in the multi-modal input, wherein the one or more domain agents filter results associated with the subsequent request according to the calculated route.
  - 7. The method of claim 1, wherein the dynamic recognition grammar includes information associated with one or more topological domains.
  - 8. The method of claim 7, wherein the one or more topological domains include physical, temporal, directional, and civil organizational proximities with respect to a current location associated with the navigation device.
  - 9. The method of claim 1, wherein the shared knowledge that the conversational language processor uses to analyze the one or more preliminary interpretations and determine the probable interpretation of the natural language utterance includes one or more inferences generated by an inferencing engine associated with the navigation device.
  - 10. The method of claim 9, further comprising generating a response to the natural language utterance to suggest one or more services available in the navigation context, wherein the conversational language processor uses the one or more inferences generated by the inferencing engine to determine the one or more suggested services.
  - 11. The method of claim 1, wherein the plurality of information sources further include the shared knowledge, the information associated with the navigation context, and one or more information sources relating to maps, destinations, directories, points of interest, traffic, parking, weather, events, user address books, user devices, user systems a search engine, and a plurality of domain agents.
  - 12. The method of claim 1, wherein the shared knowledge includes dialogue history information, request history information, user interface state information, short-term user profile information, long-term user profile information, peer user profile information, and user location information.

13. A computer-implemented method for providing a natural language voice user interface in an integrated voice navigation services environment, comprising:
- receiving a multi-modal input that includes a natural language utterance at one or more input devices coupled to a navigation device, wherein the natural language utterance in the multi-modal input relates to a navigation context;
  
  generating one or more preliminary interpretations of the natural language utterance using a speech recognition engine associated with the navigation device, wherein the speech recognition engine uses a dynamic recognition grammar to generate the one or more preliminary interpretations of the natural language utterance;
  
  analyzing, with a conversational language processor on the navigation device, the one or more preliminary interpretations using shared knowledge and information associated with the navigation context to determine a probable interpretation of the natural language utterance in the navigation context; and
  
  executing, on the navigation device, a navigation agent associated with the navigation context to process the probable interpretation of the natural language utterance, wherein executing the navigation agent to process the probable interpretation of the natural language utterance includes;
  
  identifying, by the navigation agent executing on the navigation device, one or more requests in the natural language utterance that relate to the navigation context from the probable interpretation of the natural language utterance;
  
  determining that the one or more requests identified in the natural language utterance include a multi-modal request to control a map display, wherein the navigation agent executing on the navigation device determines that the one or more requests include the multi-modal request to control the map display using information associated with a plurality of information sources, which include at least a navigation-specific information source;
  
  associating a non-voice component in the multi-modal input with the probable interpretation of the natural language utterance, wherein the non-voice component in the multi-modal input identifies a portion of the map display; and
  
  issuing a command to control the identified portion of the map display in accordance with the probable interpretation of the natural language utterance to resolve the one or more requests identified in the natural language utterance.

14. A system for providing a natural language voice user interface in an integrated voice navigation services environment, comprising:
- one or more input devices configured to receive a multi-modal input that includes a natural language utterance, wherein the natural language utterance in the multi-modal input relates to a navigation context;
  
  a speech recognition engine configured to generate one or more preliminary interpretations of the natural language utterance using a dynamic recognition grammar;
  
  a conversational language processor configured to analyze the one or more preliminary interpretations using shared knowledge and information associated with the navigation context to determine a probable interpretation of the natural language utterance in the navigation context; and
  
  a navigation agent associated with the navigation context and configured to;
  
  identify one or more requests in the natural language utterance that relate to the navigation context from the probable interpretation of the natural language utterance;
  
  determine that the one or more requests identified in the natural language utterance include a multi-modal request to control a map display using information associated with a plurality of information sources, which include at least a navigation-specific information source;
  
  associate a non-voice component in the multi-modal input with the probable interpretation of the natural language utterance, wherein the non-voice component in the multi-modal input identifies a portion of the map display; and
  
  issue a command to control the identified portion of the map display in accordance with the probable interpretation of the natural language utterance to resolve the one or more requests identified in the natural language utterance.

15. A system for providing a natural language voice user interface in an integrated voice navigation services environment, comprising:
- an input device configured to receive a natural language utterance, wherein the natural language utterance relates to a navigation context;
  
  a multi-pass speech recognition module configured to;
  
  recognize one or more words in the natural language utterance that define a command in the navigation context;
  
  recognize one or more additional words in the natural language utterance that define a location associated with the command in the navigation context; and
  
  generate a dynamic recognition grammar based on the location defined in the one or more additional words recognized in the natural language utterance;
  
  a speech recognition engine configured to use the dynamic recognition grammar to generate one or more preliminary interpretations of the natural language utterance;
  
  a conversational language processor configured to analyze the one or more preliminary interpretations using shared knowledge and information associated with the navigation context to determine a probable interpretation of the natural language utterance in the navigation context; and
  
  a navigation agent associated with the navigation context and configured to;
  
  identify one or more requests in the natural language utterance that relate to the navigation context from the probable interpretation of the natural language utterance; and
  
  resolve the one or more requests using information associated with a plurality of information sources, which include at least a navigation-specific information source.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
- - 16. The system of claim 15, wherein to resolve the one or more requests using the information associated with the plurality of information sources, the navigation agent is further configured to:
    - determine that the one or more requests in the natural language utterance approximate requested information; and
      
      cause the conversational language processor to manage a dialogue that includes one or more subsequent interactions to successively refine and resolve the requested information approximated in the one or more requests.
  - 17. The system of claim 16, wherein the one or more subsequent interactions in the dialogue include one or more output prompts and one or more subsequent multi-modal inputs to successively refine and resolve the requested information approximated in the one or more requests.
  - 18. The system of claim 15, wherein to resolve the one or more requests using the information associated with the plurality of information sources, the navigation agent is further configured:
    - determine that the one or more requests identified in the natural language utterance include a navigation request to calculate a route to a full or partial address;
      
      calculate the route from a current location associated with a navigation device to a destination having an address that best corresponds to the full or partial address; and
      
      generate dynamic directions to provide a guide on the route from the current location associated with the navigation device to the destination associated with the route, wherein the dynamic directions are driven using the information associated with the plurality of information sources and the current location associated with the navigation device.
  - 19. The system of claim 18, wherein the dynamic directions include information about one or more destinations, one or more points of interest, traffic, parking, weather, or one or more events relevant to the route and the current location associated with the navigation device.
  - 20. The system of claim 18, further comprising one or more domain agents configured to:
    - resolve a subsequent request in a multi-modal input received at the input device subsequent to the navigation agent having calculated the route; and
      
      filter results associated with the subsequent request according to the calculated route to resolve the subsequent request.
  - 21. The system of claim 15, wherein the dynamic recognition grammar includes information associated with one or more topological domains.
  - 22. The system of claim 21, wherein the one or more topological domains include physical, temporal, directional, and civil organizational proximities with respect to a current location associated with a navigation device.
  - 23. The system of claim 15, further comprising an inferencing engine configured to generate one or more inferences, wherein the shared knowledge used in the conversational language processor to analyze the one or more preliminary interpretations and determine the probable interpretation of the natural language utterance includes the one or more inferences.
  - 24. The system of claim 23, wherein the conversational language processor is further configured to generate a response to the natural language utterance to suggest one or more services available in the navigation context using the one or more inferences.
  - 25. The system of claim 15, wherein the plurality of information sources further include the shared knowledge, the information associated with the navigation context, and one or more information sources relating to maps, destinations, directories, points of interest, traffic, parking, weather, events, user address books, user devices, user systems, a search engine, and a plurality of domain agents.
  - 26. The system of claim 15, wherein the shared knowledge includes dialogue history information, request history information, user interface state information, short-term user profile information, long-term user profile information, peer user profile information, and user location information.

27. A method for providing a natural language voice user interface in an integrated voice navigation services environment, comprising:
- receiving a natural language utterance at an input device coupled to a navigation device, wherein the natural language utterance relates to a navigation context;
  
  generating one or more preliminary interpretations of the natural language utterance using a speech recognition engine associated with the navigation device, wherein generating the one or more preliminary interpretations of the natural language utterance includes;
  
  recognizing one or more words in the natural language utterance that define a navigation command in the navigation context, wherein the speech recognition engine includes a multi-pass speech recognition module that recognizes the one or more words that define the navigation command;
  
  recognizing, at the multi-pass speech recognition module, one or more additional words in the natural language utterance that approximate a destination associated with the navigation command; and
  
  generating, at the multi-pass speech recognition module, a dynamic recognition grammar based on the approximated destination associated with the navigation command or a current location associated with the navigation device, wherein the speech recognition engine uses the dynamic recognition grammar to generate the one or more preliminary interpretations of the natural language utterance;
  
  analyzing, with a conversational language processor on the navigation device, the one or more preliminary interpretations using shared knowledge and information associated with the navigation context to determine a preliminary destination having an address that best corresponds to a full or partial address associated with the approximated destination;
  
  executing, on the navigation device, a navigation agent associated with the navigation context to calculate a route from the current location associated with the navigation device to the preliminary destination; and
  
  managing, via the conversational language processor, a dialogue that includes one or more subsequent interactions to successively refine the approximated destination until a final destination associated with the navigation command has been resolved.
- View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38)
- - 28. The method of claim 27, wherein the one or more subsequent interactions in the managed dialogue include one or more output prompts and one or more subsequent multi-modal inputs to successively refine the approximated destination into the final destination.
  - 29. The method of claim 27, further comprising dynamically recalculating the route in response to the one or more subsequent interactions refining the approximated destination.
  - 30. The method of claim 29, wherein dynamically recalculating the route includes generating directions from the current location associated with the navigation device to the refined approximated destination.
  - 31. The method of claim 27, wherein the one or more subsequent interactions in the managed dialogue resolve the final destination substantially later in time relative to the navigation agent initially calculating the route to the preliminary destination.
  - 32. The method of claim 27, wherein analyzing the one or more preliminary interpretations to determine the preliminary destination includes:
    - identifying multiple destinations having addresses that possibly correspond to the full or partial address associated with the approximated destination; and
      
      selecting one of the multiple identified destinations having a highest ranking to be the preliminary destination.
  - 33. The method of claim 27, wherein analyzing the one or more preliminary interpretations to determine the preliminary destination further includes ranking the multiple identified destinations according to proximities to the current location associated with the navigation device or the full or partial address associated with the approximated destination.
  - 34. The method of claim 33, wherein the proximities depend on one or more topological domains that include physical, temporal, directional, and civil organizational proximities relative to the current location associated with the navigation device or the full or partial address associated with the approximated destination.
  - 35. The method of claim 27, wherein executing the navigation agent to calculate the route includes generating dynamic directions to provide a guide on the route from the current location associated with the navigation device to the preliminary destination, wherein the navigation agent drives the dynamic directions using the information associated with the navigation context and the current location associated with the navigation device.
  - 36. The method of claim 27, wherein the information associated with the navigation context includes information about one or more destinations, one or more points of interest, traffic, parking, weather, or one or more events relevant to the route and the current location associated with the navigation device.
  - 37. The method of claim 27, wherein the shared knowledge includes dialogue history information, request history information, user interface state information, short-term user profile information, long-term user profile information, peer user profile information, and user location information.
  - 38. The method of claim 27, further comprising dynamically recalculating the route in response to the managed dialogue resolving the final destination, wherein dynamically recalculating the route includes generating directions from the current location associated with the navigation device to the resolved final destination.

39. A system for providing a natural language voice user interface in an integrated voice navigation services environment, comprising:
- an input device coupled to a navigation device, wherein the input device is configured to receive a natural language utterance that relates to a navigation context;
  
  a multi-pass speech recognition module configured to;
  
  recognize one or more words in the natural language utterance that define a navigation command in the navigation context;
  
  recognize one or more additional words in the natural language utterance that approximate a destination associated with the navigation command; and
  
  generate a dynamic recognition grammar based on the approximated destination associated with the navigation command or a current location associated with the navigation device;
  
  a speech recognition engine configured to use the dynamic recognition grammar to generate the one or more preliminary interpretations of the natural language utterance;
  
  a conversational language processor configured to analyze the one or more preliminary interpretations using shared knowledge and information associated with the navigation context to determine a preliminary destination having an address that best corresponds to a full or partial address associated with the approximated destination; and
  
  a navigation agent associated with the navigation context and configured to;
  
  calculate a route from the current location associated with the navigation device to the preliminary destination; and
  
  manage, via the conversational language processor, a dialogue that includes one or more subsequent interactions to successively refine the approximated destination until a final destination associated with the navigation command has been resolved.
- View Dependent Claims (40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50)
- - 40. The system of claim 39, wherein the one or more subsequent interactions in the managed dialogue include one or more output prompts and one or more subsequent multi-modal inputs to successively refine the approximated destination into the final destination.
  - 41. The system of claim 39, wherein the navigation agent is further configured to dynamically recalculate the route in response to the one or more subsequent interactions having refined the approximated destination.
  - 42. The system of claim 41, wherein the navigation agent is further configured to generate directions from the current location associated with the navigation device to the refined approximated destination to dynamically recalculate the route.
  - 43. The system of claim 39, wherein the one or more subsequent interactions in the managed dialogue resolve the final destination substantially later in time relative to when the navigation agent initially calculated the route to the preliminary destination.
  - 44. The system of claim 39, wherein to analyze the one or more preliminary interpretations to determine the preliminary destination, the conversational language processor is further configured to:
    - identify multiple destinations having addresses that possibly correspond to the full or partial address associated with the approximated destination; and
      
      select one of the multiple identified destinations having a highest ranking to be the preliminary destination.
  - 45. The system of claim 44, wherein to analyze the one or more preliminary interpretations to determine the preliminary destination, the conversational language processor is further configured to rank the multiple identified destinations according to proximities to the current location associated with the navigation device or the full or partial address associated with the approximated destination.
  - 46. The system of claim 45, wherein the proximities depend on one or more topological domains that include physical, temporal, directional, and civil organizational proximities relative to the current location associated with the navigation device or the full or partial address associated with the approximated destination.
  - 47. The system of claim 39, wherein to calculate the route from the current location associated with the navigation device to the preliminary destination, the navigation agent is further configured to generate dynamic directions to provide a guide on the route from the current location associated with the navigation device to the preliminary destination, wherein the dynamic directions are driven using the information associated with the navigation context and the current location associated with the navigation device.
  - 48. The system of claim 39, wherein the information associated with the navigation context includes information about one or more destinations, one or more points of interest, traffic, parking, weather, or one or more events relevant to the route and the current location associated with the navigation device.
  - 49. The system of claim 39, wherein the shared knowledge includes dialogue history information, request history information, user interface state information, short-term user profile information, long-term user profile information, peer user profile information, and user location information.
  - 50. The system of claim 39, wherein the navigation agent is further configured to dynamically recalculate the route in response to the managed dialogue having resolved the final destination and generate directions from the current location associated with the navigation device to the resolved final destination.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Inc.
Original Assignee
VoiceBox Technologies, Inc. (Microsoft Corporation)
Inventors
Kennewick, Michael R., Menaker, Sam, Zimmerman, Bernie, Di Cristo, Philippe, Armstrong, Lynn, Guttigoli, Sheetal, Tjalve, Michael, Salomon, Ari, Baldwin, Larry, Cheung, Catherine
Primary Examiner(s)
Vo, Huyen X.

Application Number

US11/954,064
Publication Number

US 20090150156A1
Time in Patent Office

1,561 Days
Field of Search

704/235, 704/231, 704/257, 704/275, 704/270, 704/251, 704 1- 10, 704/252, 704/254, 704/255, 704/276, 701/209, 701/211
US Class Current

704/257
CPC Class Codes

G01C 21/3608   using speech input, e.g. us...

G06Q 30/0261   based on user location

G10L 15/00   Speech recognition G10L17/0...

G10L 15/04   Segmentation; Word boundary...

G10L 15/08   Speech classification or se...

G10L 15/19   Grammatical context, e.g. d...

G10L 15/193   Formal grammars, e.g. finit...

G10L 15/22   Procedures used during a sp...

System and method for providing a natural language voice user interface in an integrated voice navigation services environment

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

908 Citations

50 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for providing a natural language voice user interface in an integrated voice navigation services environment

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

908 Citations

50 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links