APPARATUS AND ALGORITHMIC PROCESS FOR AN ADAPTIVE NAVIGATION POLICY IN PARTIALLY OBSERVABLE ENVIRONMENTS
First Claim
1. A method for navigating from a starting state to a goal state in a partially-observable environment, the method comprising:
- identifying locations within the environment, such that connections between the locations form a plurality of different paths between the starting state and the goal state;
determining a reward value for each connection from one location to another location;
identifying landmarks among the locations;
associating a value function for each connection from one landmark to another location or landmark, the value function summarizing reward values from the one landmark to the goal state; and
navigating from the starting state to the goal state by applying a policy to information gathered by at least one sensor to select connections at each location to form a path to the goal state.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus and method for automatic learning of high-level navigation in partially observable environments with landmarks uses full state information available at the landmark positions to determine navigation policy. Landmark Markov Decision Processes (MDPs) can be generated only for encountered parts of an environment when navigating from a starting state to a goal state within the environment, thereby reducing computational resources needed for a navigation solution that uses a fully modeled environment. An MDP policy is calculated using the SarsaLandmark algorithm, and the policy is transformed to a navigation solution based on the current position and connectivity information.
36 Citations
16 Claims
-
1. A method for navigating from a starting state to a goal state in a partially-observable environment, the method comprising:
-
identifying locations within the environment, such that connections between the locations form a plurality of different paths between the starting state and the goal state; determining a reward value for each connection from one location to another location; identifying landmarks among the locations; associating a value function for each connection from one landmark to another location or landmark, the value function summarizing reward values from the one landmark to the goal state; and navigating from the starting state to the goal state by applying a policy to information gathered by at least one sensor to select connections at each location to form a path to the goal state. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A navigation apparatus for navigating from a starting state to a goal state, the apparatus comprising:
-
means for identifying locations within the environment, such that connections between the locations form a plurality of different paths between the starting state and the goal state; means for determining a reward value for each connection from one location to another location; means for identifying landmarks among the locations; means for associating a value function for each connection from one landmark to another location or landmark, the value function summarizing reward values from the one landmark to the goal state; and means for navigating from the starting state to the goal state by applying a policy to information gathered by at least one sensor to select connections at each location to form a path to the goal state.
-
-
15. A navigation control unit for navigating from a starting state to a goal state having hardware computing components including a processor and memory, the control unit comprising:
-
a location unit configured to identify locations within the environment, such that connections between the locations form a plurality of different paths between the starting state and the goal state; a reward unit configured to determine a reward value for each connection from one location to another location; a landmark unit configured to identify landmarks among the locations; a value function unit configured to associate a value function for each connection from one landmark to another location or landmark, the value function summarizing reward values from the one landmark to the goal state; and a navigating unit configured to navigate from the starting state to the goal state by applying a policy to information gathered by at least one sensor to select connections at each location to form a path to the goal state. - View Dependent Claims (16)
-
Specification