Neural network model for reaching a goal state
First Claim
1. A neural network model having an input line for receiving state information for a plurality of states, and an output generator for controlling the movement of an object along a path of selected states among said plurality of states, said neural network model comprising:
- a satisfaction unit, comprising;
a satisfaction index;
means for detecting a first state, wherein said first state is a current state;
first determining means for determining that said current state is a non-goal state;
first modifying means, responsive to said first determining means, for modifying said satisfaction index to indicate a reduced level of satisfaction;
second determining means for determining that said current state is a goal state;
second modifying means, responsive to said second determining means, for modifying said satisfaction index to indicate an increased level of satisfaction;
at least three action units corresponding to at least three directions of movement, each of said action units comprising;
means for increasing a randomness factor if said satisfaction index indicates a low level of satisfaction;
means for decreasing said randomness factor if said satisfaction index indicates a high level of satisfaction;
means for randomly selecting by said randomness factor a temporary weight from a temporary weight range;
means for adding a permanent weight to said temporary weight to achieve an effective weight; and
sending means for sending an indication to move said object in the direction of movement that corresponds to said action unit to said output generator if said effective weight exceeds a predetermined value.
1 Assignment
0 Petitions
Accused Products
Abstract
An object, such as a robot, is located at an initial state in a finite state space area and moves under the control of the unsupervised neural network model of the invention. The network instructs the object to move in one of several directions from the initial state. Upon reaching another state, the model again instructs the object to move in one of several directions. These instructions continue until either: a) the object has completed a cycle by ending up back at a state it has been to previously during this cycle, or b) the object has completed a cycle by reaching the goal state. If the object ends up back at a state it has been to previously during this cycle, the neural network model ends the cycle and immediately begins a new cycle from the present location. When the object reaches the goal state, the neural network model learns that this path is productive towards reaching the goal state, and is given delayed reinforcement in the form of a "reward". Upon reaching a state, the neural network model calculates a level of satisfaction with its progress towards reaching the goal state. If the level of satisfaction is low, the neural network model is more likely to override what has been learned thus far and deviate from a path known to lead to the goal state to experiment with new and possibly better paths. If the level of satisfaction is high, the neural network model is much less likely to experiment with new paths. The object is guaranteed to eventually find the best path to the goal state from any starting location, assuming that the level of satisfaction does not exceed a threshold point where learning ceases.
12 Citations
6 Claims
-
1. A neural network model having an input line for receiving state information for a plurality of states, and an output generator for controlling the movement of an object along a path of selected states among said plurality of states, said neural network model comprising:
-
a satisfaction unit, comprising; a satisfaction index; means for detecting a first state, wherein said first state is a current state; first determining means for determining that said current state is a non-goal state; first modifying means, responsive to said first determining means, for modifying said satisfaction index to indicate a reduced level of satisfaction; second determining means for determining that said current state is a goal state; second modifying means, responsive to said second determining means, for modifying said satisfaction index to indicate an increased level of satisfaction; at least three action units corresponding to at least three directions of movement, each of said action units comprising; means for increasing a randomness factor if said satisfaction index indicates a low level of satisfaction; means for decreasing said randomness factor if said satisfaction index indicates a high level of satisfaction; means for randomly selecting by said randomness factor a temporary weight from a temporary weight range; means for adding a permanent weight to said temporary weight to achieve an effective weight; and sending means for sending an indication to move said object in the direction of movement that corresponds to said action unit to said output generator if said effective weight exceeds a predetermined value. - View Dependent Claims (2, 3, 4, 5, 6)
-
Specification