APPARATUS AND ALGORITHMIC PROCESS FOR AN ADAPTIVE NAVIGATION POLICY IN PARTIALLY OBSERVABLE ENVIRONMENTS

US 20120233102A1
Filed: 03/11/2011
Published: 09/13/2012
Est. Priority Date: 03/11/2011
Status: Abandoned Application

First Claim

Patent Images

1. A method for navigating from a starting state to a goal state in a partially-observable environment, the method comprising:

identifying locations within the environment, such that connections between the locations form a plurality of different paths between the starting state and the goal state;

determining a reward value for each connection from one location to another location;

identifying landmarks among the locations;

associating a value function for each connection from one landmark to another location or landmark, the value function summarizing reward values from the one landmark to the goal state; and

navigating from the starting state to the goal state by applying a policy to information gathered by at least one sensor to select connections at each location to form a path to the goal state.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An apparatus and method for automatic learning of high-level navigation in partially observable environments with landmarks uses full state information available at the landmark positions to determine navigation policy. Landmark Markov Decision Processes (MDPs) can be generated only for encountered parts of an environment when navigating from a starting state to a goal state within the environment, thereby reducing computational resources needed for a navigation solution that uses a fully modeled environment. An MDP policy is calculated using the SarsaLandmark algorithm, and the policy is transformed to a navigation solution based on the current position and connectivity information.

36 Citations

View as Search Results

16 Claims

1. A method for navigating from a starting state to a goal state in a partially-observable environment, the method comprising:
- identifying locations within the environment, such that connections between the locations form a plurality of different paths between the starting state and the goal state;
  
  determining a reward value for each connection from one location to another location;
  
  identifying landmarks among the locations;
  
  associating a value function for each connection from one landmark to another location or landmark, the value function summarizing reward values from the one landmark to the goal state; and
  
  navigating from the starting state to the goal state by applying a policy to information gathered by at least one sensor to select connections at each location to form a path to the goal state.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method according to claim 1, wherein the navigating includes selecting a connection based on value functions and reward values indicated for each connection originating from an encountered landmark.
  - 3. The method according to claim 2, wherein the selection of a connection is performed only at encountered locations, during the navigating, to form the path.
  - 4. The method according to claim 3, further comprising:
    - updating a value function associated with a connection from a landmark based on changes in reward values from the landmark to the goal state via the connection, wherein the selection of a connection is based on the updated value function.
  - 5. The method according to claim 1, wherein the policy includes maximizing reward values of a path of the selected connections to the goal state.
  - 6. The method according to claim 5, wherein the reward values are negative values which have a magnitude reflecting costs associated with each connection.
  - 7. The method according to claim 6, wherein the costs include traffic information.
  - 8. The method according to claim 7, whereinthe traffic information includes traffic congestion information and road speed information, andthe cost for a connection increases proportional to traffic congestion and inversely proportional to road speed.
  - 9. The method according to claim 8, wherein the information gathered by the at least one sensor includes the traffic congestion information and the road speed information so that the selection of connections at each location to form the part to the goal state reflects the traffic congestion and the road speed.
  - 10. The method according to claim 9, wherein the at least one sensor gathers the traffic congestion information and the road speed information in real-time so that the traffic congestion information and the road speed information reflects the traffic congestion and the road speed in real-time.
  - 11. The method according to claim 1, further comprising:
    - selecting, by a user, a particular location or landmark for the path to include such that the selection of connections at each location to form the path to the goal state includes a connection to the particular location or landmark.
  - 12. A computer-readable storage medium storing a set of instructions which, when executed by a processor, cause the processor to perform a method according to claim 1 for navigating from a starting state to a goal state in a partially-observable environment.
  - 13. The computer-readable storage medium according to claim 12, wherein the computer-readable storage medium is a functional hardware component of an electronic control unit for a vehicle.

14. A navigation apparatus for navigating from a starting state to a goal state, the apparatus comprising:
- means for identifying locations within the environment, such that connections between the locations form a plurality of different paths between the starting state and the goal state;
  
  means for determining a reward value for each connection from one location to another location;
  
  means for identifying landmarks among the locations;
  
  means for associating a value function for each connection from one landmark to another location or landmark, the value function summarizing reward values from the one landmark to the goal state; and
  
  means for navigating from the starting state to the goal state by applying a policy to information gathered by at least one sensor to select connections at each location to form a path to the goal state.

15. A navigation control unit for navigating from a starting state to a goal state having hardware computing components including a processor and memory, the control unit comprising:
- a location unit configured to identify locations within the environment, such that connections between the locations form a plurality of different paths between the starting state and the goal state;
  
  a reward unit configured to determine a reward value for each connection from one location to another location;
  
  a landmark unit configured to identify landmarks among the locations;
  
  a value function unit configured to associate a value function for each connection from one landmark to another location or landmark, the value function summarizing reward values from the one landmark to the goal state; and
  
  a navigating unit configured to navigate from the starting state to the goal state by applying a policy to information gathered by at least one sensor to select connections at each location to form a path to the goal state.
- View Dependent Claims (16)
- - 16. The navigation control unit according to claim 15, wherein the navigation control unit is installed into a vehicle and the navigating unit is configured to instruct actuators of the vehicle that control steering, throttling and braking of the vehicle.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Toyota Motor Engineering & Manufacturing North America Incorporated (Toyota Motor Corporation)
Original Assignee
Toyota Motor Engineering & Manufacturing North America Incorporated (Toyota Motor Corporation)
Inventors
JAMES, Michael Robert

Application Number

US13/046,474
Publication Number

US 20120233102A1
Time in Patent Office

Days
Field of Search
US Class Current

706/14
CPC Class Codes

G01C 21/3492 employing speed data or tra...

G06N 20/00 Machine learning

APPARATUS AND ALGORITHMIC PROCESS FOR AN ADAPTIVE NAVIGATION POLICY IN PARTIALLY OBSERVABLE ENVIRONMENTS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

36 Citations

16 Claims

Specification

Use Cases

Quick Links

Others

APPARATUS AND ALGORITHMIC PROCESS FOR AN ADAPTIVE NAVIGATION POLICY IN PARTIALLY OBSERVABLE ENVIRONMENTS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

36 Citations

16 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others