Dynamic programming network
First Claim
1. An efficient, memory conserving, application integrating dynamic programming method comprising the steps of:
- establishing a prototype element of a network, said establishing comprising the steps of;
assigning a table or function approximator for maintaining state values;
identifying a method for determining element state based on state values maintained from said assigning step;
applying a process for dynamically programming said element'"'"'s state values based on succeeding state values resulting from said element'"'"'s state from said identifying step;
connecting a plurality of elements from said establishing step to form a network;
coupling signal transmitting sensors to elements from said connecting step;
coupling elements from said connecting step to effectors;
maintaining within each element a running average of values for the state of an element in a cycle after such value occurs;
cycling said network by determining the state of all elements and sensors therein, electing as each element'"'"'s state the highest running average value from said maintaining step;
sending an output signal to network effectors; and
presenting to said sensors a pattern based on a state that results from effector activity from said sending step.
1 Assignment
0 Petitions
Accused Products
Abstract
A dynamic programming network integrating sensor management, sensor fusion, and an application in a seamless structure in which these functions are mutually dependent and develop autonomously and concomitantly with experience. The dynamic programming network autonomously divides these functions into multiple subtasks that it can assign to the processors of a fine-grained parallel computer. As the number of processors available for these subtasks increases the network may attain its objective more efficiently. This architecture confers the greatest advantage in feature-rich applications such as identification of targets in synthetic aperture radar, visual, and infrared images. The design can be extended, however, to such diverse and general applications as control problems and machine intelligence. For the pattern recognition applications, the dynamic programming network detects, selects, and identifies features and patterns comprising those features via a series of observations rather than processing all data available in each image, thereby minimizing sensor usage and volume of data processed. The network remembers similar features contained in many images instead of many images containing similar features, thus conserving memory and facilitating data retrieval.
-
Citations
17 Claims
-
1. An efficient, memory conserving, application integrating dynamic programming method comprising the steps of:
-
establishing a prototype element of a network, said establishing comprising the steps of;
assigning a table or function approximator for maintaining state values;
identifying a method for determining element state based on state values maintained from said assigning step;
applying a process for dynamically programming said element'"'"'s state values based on succeeding state values resulting from said element'"'"'s state from said identifying step;
connecting a plurality of elements from said establishing step to form a network;
coupling signal transmitting sensors to elements from said connecting step;
coupling elements from said connecting step to effectors;
maintaining within each element a running average of values for the state of an element in a cycle after such value occurs;
cycling said network by determining the state of all elements and sensors therein, electing as each element'"'"'s state the highest running average value from said maintaining step;
sending an output signal to network effectors; and
presenting to said sensors a pattern based on a state that results from effector activity from said sending step. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
where α
represents learning rate, β
(e.g., 1-α
) represents an averaging factor, t represents the number of the current cycle, (C) represents costs, (P) represents penalties levied, (R) represents rewards conferred, (γ
) represents a discounting factor andrepresents the highest value among the element'"'"'s possible states (μ
).
-
-
6. The efficient, memory conserving, sensor management, sensor fusion and application integrating dynamic programming network of claim 1 wherein said maintaining step further comprises the use of other forms of dynamic programming from the group consisting of Q-learning, TD-lamda, value iteration, and advantage learning.
-
7. The efficient, memory conserving, application integrating dynamic programming method of claim 1 further including, after said cycling step, the step of running a trial of one or more cycles through said network.
-
8. The efficient, memory conserving, application integrating dynamic programming method of claim 1, said presenting step further comprising the step of multiplexing binary states of said identifiers to determine network response.
-
9. The efficient, memory conserving, application integrating dynamic programming method of claim 1 further including, after said presenting step, the step of
engaging said network in a training exercise comprising the steps of: -
applying the network to a simulated or real task;
cycling said network, allowing effectors to act upon the state of the application; and
rewarding or penalizing the network based on said effector actions.
-
-
10. An efficient, memory conserving, application integrating dynamic programming network comprising:
-
a prototype element of said network comprising;
a state value maintaining function approximator;
an element state determining program based on said state values;
a process for dynamically programming said element'"'"'s state values based on succeeding state values resulting from said element'"'"'s states;
a network forming plurality of said prototype elements connected together, signal transmitting sensors connected to said elements;
said elements connected to effectors;
a plurality of running average values for a state of each of said elements, said running average value maintained within each element and based on a state of an element in a cycle after such value occurs;
highest running average value selecting means for each of said elements; and
a pattern output from said network to network effectors based on said highest running average value of said elements. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
where α
represents learning rate, β
(e.g., 1-α
) represents an averaging factor, t represents the number of the current cycle, (C) represents costs, (P) represents penalties levied, (R) represents rewards conferred, (γ
) represents a discounting factor and(V) represents the highest value among the element'"'"'s possible states (μ
).
-
-
14. The efficient, memory conserving, application integrating dynamic programming network of claim 10 wherein said plurality of running average values for a state of each of said elements is maintained by a method from the group consisting of Q-learning, TD-lamda, value iteration, and advantage learning.
-
15. The efficient, memory conserving, application integrating dynamic programming network of claim 10 further including a reward applied to said state value of said element for successfully achieving a goal, said reward increasing both said highest running average value for each element and associated chances of selection by said network.
-
16. The efficient, memory conserving, application integrating dynamic programming network of claim 10 further including a penalty applied to said state value of said element(s) for failing to achieve a goal, said penalty decreasing both said highest running average value for each element and associated chances of selection by said network.
-
17. The efficient, memory conserving, application integrating dynamic programming network of claim 10 further including a cost applied to said state value of said element to signify the energy or other expenditure to move from one task state to another.
Specification