Vehicle dispatching method and system
First Claim
1. A method of optimizing vehicle dispatching in a dynamic work area resulting from a linear program comprising:
- inputting a vehicle dispatching schedule for a plurality of vehicles as a state representation into a reinforcement learning algorithm, the state representation having a plurality of states, each state having a plurality of possible actions;
using a computer, the computer comprising computer readable media programmed with a set of instructions causing the computer to perform the steps of;
running a simulation of the states by selecting one of the possible actions within each state, one running of the simulation being an episode and producing a result based on a proximity to optimum performance of the plurality of vehicles;
assigning a reward value based on the result;
propagating the reward value back through the simulation with reference to time between states;
for each action in each state within the episode, determining a policy value based on at least one of the reward value, a subsequent state, a subsequent action, elapsed time in the episode at the state and elapsed time in the episode at the subsequent state;
developing a policy for each state in which the action in each state that produces a maximum policy value is a preferred action;
dispatching the preferred action to the plurality of vehicles in the dynamic work area; and
causing the plurality of vehicles to perform in accordance with the preferred action.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for dispatching a plurality of vehicles operating in a work area among a plurality of destination locations and a plurality of source locations includes implementing linear programming that takes in an optimization function and constraints to generate an optimum schedule for optimum production, utilizing a reinforcement learning algorithm that takes in the schedule as input and cycles through possible environmental states that could occur within the schedule by choosing one possible action for each possible environmental state and by observing the reward obtained by taking the action at each possible environmental state, developing a policy for each possible environmental state, and providing instructions to follow an action associated with the policy.
-
Citations
5 Claims
-
1. A method of optimizing vehicle dispatching in a dynamic work area resulting from a linear program comprising:
-
inputting a vehicle dispatching schedule for a plurality of vehicles as a state representation into a reinforcement learning algorithm, the state representation having a plurality of states, each state having a plurality of possible actions; using a computer, the computer comprising computer readable media programmed with a set of instructions causing the computer to perform the steps of; running a simulation of the states by selecting one of the possible actions within each state, one running of the simulation being an episode and producing a result based on a proximity to optimum performance of the plurality of vehicles; assigning a reward value based on the result; propagating the reward value back through the simulation with reference to time between states; for each action in each state within the episode, determining a policy value based on at least one of the reward value, a subsequent state, a subsequent action, elapsed time in the episode at the state and elapsed time in the episode at the subsequent state; developing a policy for each state in which the action in each state that produces a maximum policy value is a preferred action; dispatching the preferred action to the plurality of vehicles in the dynamic work area; and causing the plurality of vehicles to perform in accordance with the preferred action. - View Dependent Claims (2, 3, 4, 5)
-
Specification