×

Learning controller with advantage updating algorithm

  • US 5,608,843 A
  • Filed: 08/01/1994
  • Issued: 03/04/1997
  • Est. Priority Date: 08/01/1994
  • Status: Expired due to Fees
First Claim
Patent Images

1. A learning controller comprising:

  • means for storing a value function V and an advantage function A in a function approximation memory system;

    means for updating said value function V and said advantage function A according to reinforcements received from an environment;

    said means for updating including learning means for performing an action ut in a state xt, leading to a state xt+Δ

    t and a reinforcement R.sub.Δ

    t (xt,ut);

    said means for updating also including means for updating said advantage function A, and changing a maximum value, Amax, thereof;

    said means for updating also including means for updating said value function V in response to said Amax value change;

    means for normalizing update of said advantage function A, by choosing an action u randomly, with uniform probability; and

    means for performing said action u and said normalizing update of said advantage function A in a state x;

    said learning means and said normalizing update functioning according to an algorithm of;

    ##EQU32## where said ##EQU33## symbology represents a function-approximating supervised learning system, generating an output of X, being trained to generate a desired output of Y at a learning rate of a.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×