Neural network model for reaching a goal state
First Claim
1. A neural network model for determining the best path amongst a plurality of states from a start state to a goal state, said model comprising:
- means for learning a first path among a first subset of said plurality of states to said goal state;
means for initiating a level of satisfaction upon reaching said goal state;
returning means for returning to said start state;
creating means for creating a current path by repeatedly moving to a current state until said goal state is reached, wherein said current state is one of said plurality of states;
reducing means for reducing said level of satisfaction if said current state is a non-goal state;
increasing means for increasing said level of satisfaction if said current state is said goal state;
indicating means for indicating that said current path is the best path if said current path is better than said first path and any other previously known paths;
repeating means for repeating said returning means and said creating means if said current state is said goal state;
means for raising the likelihood that said current path will deviate from the best path determined by said indicating means when said level of satisfaction is low; and
means for lowering the likelihood that said current path will deviate from the best path determined by said indicating means when said level of satisfaction is high.
1 Assignment
0 Petitions
Accused Products
Abstract
An object, such as a robot, is located at an initial state in a finite state space area and moves under the control of the unsupervised neural network model of the invention. The network instructs the object to move in one of several directions from the initial state. Upon reaching another state, the model again instructs the object to move in one of several directions. These instructions continue until either: a) the object has completed a cycle by ending up back at a state it has been to previously during this cycle, or b) the object has completed a cycle by reaching the goal state. If the object ends up back at a state it has been to previously during this cycle, the neural network model ends the cycle and immediately begins a new cycle from the present location. When the object reaches the goal state, the neural network model learns that this path is productive towards reaching the goal state, and is given delayed reinforcement in the form of a "reward". Upon reaching a state, the neural network model calculates a level of satisfaction with its progress towards reaching the goal state. If the level of satisfaction is low, the neural network model is more likely to override what has been learned thus far and deviate from a path known to lead to the goal state to experiment with new and possibly better paths.
32 Citations
2 Claims
-
1. A neural network model for determining the best path amongst a plurality of states from a start state to a goal state, said model comprising:
-
means for learning a first path among a first subset of said plurality of states to said goal state; means for initiating a level of satisfaction upon reaching said goal state; returning means for returning to said start state; creating means for creating a current path by repeatedly moving to a current state until said goal state is reached, wherein said current state is one of said plurality of states; reducing means for reducing said level of satisfaction if said current state is a non-goal state; increasing means for increasing said level of satisfaction if said current state is said goal state; indicating means for indicating that said current path is the best path if said current path is better than said first path and any other previously known paths; repeating means for repeating said returning means and said creating means if said current state is said goal state; means for raising the likelihood that said current path will deviate from the best path determined by said indicating means when said level of satisfaction is low; and means for lowering the likelihood that said current path will deviate from the best path determined by said indicating means when said level of satisfaction is high.
-
-
2. A method determining the best path amongst a plurality of states from a start state to a goal state, said method comprising the steps of:
-
learning a first path among a first subset of said plurality of states to said goal state; initializing a level of satisfaction upon reaching said goal state; returning to said start state; creating a current path by repeatedly moving to a current state until said goal state is reached, wherein said current state is one of said plurality of states; reducing said level of satisfaction if said current state is a non-goal state; indicating that said current path is the best path if said current path is better than said first path or any other previously known paths; repeating said returning step and said creating step if said current state is said goal state; raising the likelihood that said current path will deviate from the best path indicated by said indicating step when said level of satisfaction is low; and lowering the likelihood that said current path will deviate from the best path indicated by said indicating step when said level of satisfaction is high.
-
Specification