Adaptive autonomous agent with verbal learning

US 6,754,644 B1
Filed: 03/06/2000
Issued: 06/22/2004
Est. Priority Date: 03/06/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A method for training an artificial neural network, comprising the steps of:

a first step of presenting a first stimulus to the network produce a network response;

a second step of delivering a positive consequence when the network response is a correct response to the first stimulus, wherein said first and second steps are repeated until the network response to the first stimulus is correct a number of times whereupon said first stimulus and the correct response become a learned stimulus-response pair;

a third step of selecting a new stimulus and presenting said new stimulus a number of times to the network to evoke a new response;

a fourth step of presenting at least two stimuli from the learned stimulus-response pairs and delivering a positive consequence when the network response is correct;

a fifth step of presenting the new stimulus again more than two times;

a sixth step of presenting a number of stimuli randomly-sampled from the previously-learned stimulus-response pairs and reinforcing correct network responses;

a seventh step of repeatedly presenting the new stimulus while providing a positive consequence each time the network response is correct until the network response has been correct a number of times;

an eighth step of presenting the new stimuli and stimuli selected from the previously-learned stimulus-response pairs in random order until a predetermined number of correct network responses have been produced, whereupon the new stimulus and the correct response become a learned stimulus-response pair; and

a ninth step of continuing steps one through eight until all desired stimulus-response pairs have become learned stimulus-response pairs.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An autonomous adaptive agent which can learn verbal as well as nonverbal behavior. The primary object of the system is to optimize a primary value function over time through continuously learning how to behave in an environment (which may be physical or electronic). Inputs may include verbal advice or information from sources of varying reliability as well as direct or preprocessed environmental inputs. Desired agent behavior may include motor actions and verbal behavior which may constitute a system output (and which may also function “internally” to guide external actions. A further aspect involves an efficient “training” process by which the agent can be taught to utilize verbal advice and information along with environmental inputs.

19 Citations

View as Search Results

5 Claims

1. A method for training an artificial neural network, comprising the steps of:
- a first step of presenting a first stimulus to the network produce a network response;
  
  a second step of delivering a positive consequence when the network response is a correct response to the first stimulus, wherein said first and second steps are repeated until the network response to the first stimulus is correct a number of times whereupon said first stimulus and the correct response become a learned stimulus-response pair;
  
  a third step of selecting a new stimulus and presenting said new stimulus a number of times to the network to evoke a new response;
  
  a fourth step of presenting at least two stimuli from the learned stimulus-response pairs and delivering a positive consequence when the network response is correct;
  
  a fifth step of presenting the new stimulus again more than two times;
  
  a sixth step of presenting a number of stimuli randomly-sampled from the previously-learned stimulus-response pairs and reinforcing correct network responses;
  
  a seventh step of repeatedly presenting the new stimulus while providing a positive consequence each time the network response is correct until the network response has been correct a number of times;
  
  an eighth step of presenting the new stimuli and stimuli selected from the previously-learned stimulus-response pairs in random order until a predetermined number of correct network responses have been produced, whereupon the new stimulus and the correct response become a learned stimulus-response pair; and
  
  a ninth step of continuing steps one through eight until all desired stimulus-response pairs have become learned stimulus-response pairs.
- View Dependent Claims (2)
- - 2. A method as in claim 1 wherein relationships between at least one stimulus and response are supplied by direct programming.

3. A method of training an adaptive critic-type artificial neural network to produce a sequence of responses when presented with a selected stimulus, the method comprising the steps of:
- providing an adaptive critic-type artificial neural network with minimal repertoire training enabling it to output a plurality of responses R₁-R_Nin response to prompt stimuli P₁-P_N;
  
  initializing the training by;
  
  a) setting reinforcer values associated with each response R₁-RN; and
  
  b) setting initial prompt strengths for each of prompts P₁-P_N;
  
  presenting the selected stimulus to input nodes of the network while sequentially and cyclically presenting the prompts P₁-P_Nuntil a predetermined training criteria is met, wherein the selected stimulus calls for a trained response of a sequential output of responses R₁-R_N;
  
  after presentation of each individual prompt P_i, determining a learning signal value;
  
  after presentation of each individual prompt P_i, determining if the current network response R_iis correct;
  
  if the response R_iis correct, performing the steps of;
  
  a) reducing P_iby a selected amount;
  
  b) delivering the set reinforcer value associated with the response R_i; and
  
  c) if the learning signal is outside of a predetermined acceptable range, adjusting the reinforcer associated with R_i;
  
  if the response R_iis incorrect, performing the steps of;
  
  a) if R_iis not one of the responses R₁-R_N, gradually increasing the value of P_iand presenting the selected stimulus together with the increased P_iuntil R_iis one of the responses R₁-R_N; and
  
  b) if R_iis one of the responses R₁-R_Nand not the correct response, performing the steps of;
  
  i) delivering a reinforcer value that causes a negative learning signal;
  
  ii) sequentially presenting each prompt P₁14 P_iuntil the network responses R₁-R_iare correct and upon each correct network response R_ichanging the reinforcer value for that R_ito cause a positive leaning signal;
  
  iii) after the step of sequentially presenting, delivering the set reinforcer value associated with the response R_i; and
  
  iv) if the learning signal is outside of the predetermined acceptable range, adjusting the reinforcer associated with R_i.
- View Dependent Claims (4, 5)
- - 4. The method of claim 3 wherein the step of determining a learning signal value is based upon both a change in a value of the response R_iand a change in the costs of the network response.
  - 5. The method of claim 3 wherein each response R₁-R_Nmay comprise multiple responses.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
William R. Hutchison
Original Assignee
William R. Hutchison
Inventors
Hutchison, William R.
Primary Examiner(s)
Khatri, Anil
Assistant Examiner(s)
Holmes, Michael B.

Application Number

US09/520,030
Time in Patent Office

1,569 Days
Field of Search

706/15-44
US Class Current

706/20
CPC Class Codes

G06N 3/08 Learning methods

Adaptive autonomous agent with verbal learning

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

19 Citations

5 Claims

Specification

Solutions

Use Cases

Quick Links

Adaptive autonomous agent with verbal learning

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

19 Citations

5 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links