Neural network element with reinforcement/attenuation learning
First Claim
1. An action learning control system capable of learning an input-output relationship according to own action of the system, comprising:
- a sensor configured to obtain information from an external environment and to output the obtained information;
a sensory evaluation module configured to receive information from the sensor to receive an action policy, to determine whether a state of a controlled object is stable or not based on the received information, and to output a reinforcement signal according to the determined result;
a sensor information state separating module for performing reinforcement learning, configured to receive information from the sensor, to receive the reinforcement signal from the sensory evaluation module, to receive the action policy, to give heavier weight to sensor information having higher sensory evaluation, to classify sensor information into a low-dimensioned state, and to output the state;
an action learning module, configured to receive the state from the sensor information state separating module and to output a corresponding action control command, for learning a relationship between the state and the action control command;
an attention controller configured to receive information from the sensor, to receive the reinforcement signal from the sensory evaluation module, to receive the action control command from the action learning module, and to send the action policy to the sensory evaluation module and to the sensor information state separating module;
an action sequence storing and refining module configured to receive information from the sensor, to receive the reinforcement signal from the sensory evaluation module, to receive the action control command from the action learning module, to determine a refined action control command based on the received sensor information and based on the received action control command and based on stored temporal information, and to output the refined action control command; and
an output module configured to receive the refined action control command from the action sequence storing and refining module and to output the refined action control command.
3 Assignments
0 Petitions
Accused Products
Abstract
A neural network element, outputting an output signal in response to a plurality of input signals, comprises a history memory for accumulating and storing the plurality of input signals in a temporal order as history values. It also includes an output module for outputting the output signal when an internal state exceeds a predetermined threshold value, the internal state being based on a sum of the product of a plurality of input signals and corresponding coupling coefficients. The history values depend on change of the internal state. The neural network element is configured to subtract a predetermined value from the internal state immediately after the output module fires and performs learning for reinforcing or attenuating the coupling coefficient according to the history values after the output module fires.
-
Citations
3 Claims
-
1. An action learning control system capable of learning an input-output relationship according to own action of the system, comprising:
-
a sensor configured to obtain information from an external environment and to output the obtained information; a sensory evaluation module configured to receive information from the sensor to receive an action policy, to determine whether a state of a controlled object is stable or not based on the received information, and to output a reinforcement signal according to the determined result; a sensor information state separating module for performing reinforcement learning, configured to receive information from the sensor, to receive the reinforcement signal from the sensory evaluation module, to receive the action policy, to give heavier weight to sensor information having higher sensory evaluation, to classify sensor information into a low-dimensioned state, and to output the state; an action learning module, configured to receive the state from the sensor information state separating module and to output a corresponding action control command, for learning a relationship between the state and the action control command; an attention controller configured to receive information from the sensor, to receive the reinforcement signal from the sensory evaluation module, to receive the action control command from the action learning module, and to send the action policy to the sensory evaluation module and to the sensor information state separating module; an action sequence storing and refining module configured to receive information from the sensor, to receive the reinforcement signal from the sensory evaluation module, to receive the action control command from the action learning module, to determine a refined action control command based on the received sensor information and based on the received action control command and based on stored temporal information, and to output the refined action control command; and an output module configured to receive the refined action control command from the action sequence storing and refining module and to output the refined action control command. - View Dependent Claims (2)
-
-
3. A computer-implemented method of determining an action control command using reinforcement learning, comprising:
-
obtaining information from an external environment using one or more sensors; determining whether a state of a controlled object is stable or not based on information from the sensors; outputting a reinforcement signal according to the stability determination; generating an action policy; adjusting the state separation and the reinforcement signal generation based on the action policy; performing reinforcement learning based on the reinforcement signal, comprising; giving heavier weight to sensor information having higher sensory evaluation; and classifying sensor information into a low-dimensioned state; learning a relationship between the classified state and a corresponding action control command based on the reinforcement signal; outputting a first action control command; storing and modifying an action sequence; determining a second action control command based on the obtained sensor information and based on the first action control command and based on stored temporal information; and outputting the second action control command.
-
Specification