Adaptive autonomous agent with verbal learning
First Claim
1. A method for training an artificial neural network, comprising the steps of:
- a first step of presenting a first stimulus to the network produce a network response;
a second step of delivering a positive consequence when the network response is a correct response to the first stimulus, wherein said first and second steps are repeated until the network response to the first stimulus is correct a number of times whereupon said first stimulus and the correct response become a learned stimulus-response pair;
a third step of selecting a new stimulus and presenting said new stimulus a number of times to the network to evoke a new response;
a fourth step of presenting at least two stimuli from the learned stimulus-response pairs and delivering a positive consequence when the network response is correct;
a fifth step of presenting the new stimulus again more than two times;
a sixth step of presenting a number of stimuli randomly-sampled from the previously-learned stimulus-response pairs and reinforcing correct network responses;
a seventh step of repeatedly presenting the new stimulus while providing a positive consequence each time the network response is correct until the network response has been correct a number of times;
an eighth step of presenting the new stimuli and stimuli selected from the previously-learned stimulus-response pairs in random order until a predetermined number of correct network responses have been produced, whereupon the new stimulus and the correct response become a learned stimulus-response pair; and
a ninth step of continuing steps one through eight until all desired stimulus-response pairs have become learned stimulus-response pairs.
0 Assignments
0 Petitions
Accused Products
Abstract
An autonomous adaptive agent which can learn verbal as well as nonverbal behavior. The primary object of the system is to optimize a primary value function over time through continuously learning how to behave in an environment (which may be physical or electronic). Inputs may include verbal advice or information from sources of varying reliability as well as direct or preprocessed environmental inputs. Desired agent behavior may include motor actions and verbal behavior which may constitute a system output (and which may also function “internally” to guide external actions. A further aspect involves an efficient “training” process by which the agent can be taught to utilize verbal advice and information along with environmental inputs.
19 Citations
5 Claims
-
1. A method for training an artificial neural network, comprising the steps of:
-
a first step of presenting a first stimulus to the network produce a network response;
a second step of delivering a positive consequence when the network response is a correct response to the first stimulus, wherein said first and second steps are repeated until the network response to the first stimulus is correct a number of times whereupon said first stimulus and the correct response become a learned stimulus-response pair;
a third step of selecting a new stimulus and presenting said new stimulus a number of times to the network to evoke a new response;
a fourth step of presenting at least two stimuli from the learned stimulus-response pairs and delivering a positive consequence when the network response is correct;
a fifth step of presenting the new stimulus again more than two times;
a sixth step of presenting a number of stimuli randomly-sampled from the previously-learned stimulus-response pairs and reinforcing correct network responses;
a seventh step of repeatedly presenting the new stimulus while providing a positive consequence each time the network response is correct until the network response has been correct a number of times;
an eighth step of presenting the new stimuli and stimuli selected from the previously-learned stimulus-response pairs in random order until a predetermined number of correct network responses have been produced, whereupon the new stimulus and the correct response become a learned stimulus-response pair; and
a ninth step of continuing steps one through eight until all desired stimulus-response pairs have become learned stimulus-response pairs. - View Dependent Claims (2)
-
-
3. A method of training an adaptive critic-type artificial neural network to produce a sequence of responses when presented with a selected stimulus, the method comprising the steps of:
-
providing an adaptive critic-type artificial neural network with minimal repertoire training enabling it to output a plurality of responses R1-RN in response to prompt stimuli P1-PN;
initializing the training by;
a) setting reinforcer values associated with each response R1-RN; and
b) setting initial prompt strengths for each of prompts P1-PN;
presenting the selected stimulus to input nodes of the network while sequentially and cyclically presenting the prompts P1-PN until a predetermined training criteria is met, wherein the selected stimulus calls for a trained response of a sequential output of responses R1-RN;
after presentation of each individual prompt Pi, determining a learning signal value;
after presentation of each individual prompt Pi, determining if the current network response Ri is correct;
if the response Ri is correct, performing the steps of;
a) reducing Pi by a selected amount;
b) delivering the set reinforcer value associated with the response Ri; and
c) if the learning signal is outside of a predetermined acceptable range, adjusting the reinforcer associated with Ri;
if the response Ri is incorrect, performing the steps of;
a) if Ri is not one of the responses R1-RN, gradually increasing the value of Pi and presenting the selected stimulus together with the increased Pi until Ri is one of the responses R1-RN; and
b) if Ri is one of the responses R1-RN and not the correct response, performing the steps of;
i) delivering a reinforcer value that causes a negative learning signal;
ii) sequentially presenting each prompt P114 Pi until the network responses R1-Ri are correct and upon each correct network response Ri changing the reinforcer value for that Ri to cause a positive leaning signal;
iii) after the step of sequentially presenting, delivering the set reinforcer value associated with the response Ri; and
iv) if the learning signal is outside of the predetermined acceptable range, adjusting the reinforcer associated with Ri. - View Dependent Claims (4, 5)
-
Specification