Apparatus and methods for reinforcement-guided supervised learning

US 9,008,840 B1
Filed: 04/19/2013
Issued: 04/14/2015
Est. Priority Date: 04/19/2013
Status: Active Grant

First Claim

Patent Images

1. A method of generating a predicted control output by an adaptive controller of a robotic apparatus comprising a predictor and a combiner, the method comprising:

operating the adaptive controller in accordance with a reinforcement learning process based on a reinforcement signal, the reinforcement signal being based on a performance measure associated with the reinforcement learning process;

operating the predictor in accordance with a supervised learning process based on a teaching signal, the teaching signal conveying information related to a target output of the predictor;

generating a control output via the adaptive controller based on a sensory input and the reinforcement signal, the sensory input including information associated with an environment of the robotic apparatus;

determining a predicted control output via the predictor based on the sensory input and the teaching signal;

determining a combined output via the combiner based on the control output and the predicted control output, the combined output being characterized by a transform function; and

providing the combined output via the adaptive controller to the robotic apparatus, the combined output causing the robotic apparatus to execute a maneuver in accordance with the sensory input;

wherein the teaching signal comprises the combined control output.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Framework may be implemented for transferring knowledge from an external agent to a robotic controller. In an obstacle avoidance/target approach application, the controller may be configured to determine a teaching signal based on a sensory input, the teaching signal conveying information associated with target action consistent with the sensory input, the sensory input being indicative of the target/obstacle. The controller may be configured to determine a control signal based on the sensory input, the control signal conveying information associated with target approach/avoidance action. The controller may determine a predicted control signal based on the sensory input and the teaching signal, the predicted control conveying information associated with the target action. The control signal may be combined with the predicted control in order to cause the robotic apparatus to execute the target action.

Citations

20 Claims

1. A method of generating a predicted control output by an adaptive controller of a robotic apparatus comprising a predictor and a combiner, the method comprising:
- operating the adaptive controller in accordance with a reinforcement learning process based on a reinforcement signal, the reinforcement signal being based on a performance measure associated with the reinforcement learning process;
  
  operating the predictor in accordance with a supervised learning process based on a teaching signal, the teaching signal conveying information related to a target output of the predictor;
  
  generating a control output via the adaptive controller based on a sensory input and the reinforcement signal, the sensory input including information associated with an environment of the robotic apparatus;
  
  determining a predicted control output via the predictor based on the sensory input and the teaching signal;
  
  determining a combined output via the combiner based on the control output and the predicted control output, the combined output being characterized by a transform function; and
  
  providing the combined output via the adaptive controller to the robotic apparatus, the combined output causing the robotic apparatus to execute a maneuver in accordance with the sensory input;
  
  wherein the teaching signal comprises the combined control output.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, wherein:
    - the sensory input comprises a representation of an object being present in the sensory input; and
      
      the execution of the maneuver in accordance with the sensory input comprises one or both of approaching the object or avoiding the object.
  - 3. The method of claim 2, wherein:
    - the sensory input comprises a stream of digitized frames of pixels; and
      
      the representation of the object is determined based on a spatial configuration of two or more pixels within at least one frame of the stream of digitized frames.
  - 4. The method of claim 1, wherein:
    - the reinforcement learning process is characterized by a learning parameter;
      
      the reinforcement signal is configured to cause an adjustment of the learning parameter based on a value of the performance measure;
      
      the control output is determined based on the learning parameter;
      
      the process performance is determined based on a quantity determined based on the control output and target control output; and
      
      the adjusting of the learning parameter causes generation of a second control output, the second output being characterized by a reduced value of the quantity for the sensory input.
  - 5. The method of claim 4, wherein:
    - the reinforcement signal comprises positive reinforcement responsive to the second output being closer to the target control output relative to the control output; and
      
      the reinforcement signal comprises negative reinforcement responsive to the second output being farther away from the target control output relative to the control output.
  - 6. The method of claim 1, wherein the transform function combines the predicted output and the control output via one or more operations including an additive operation.
  - 7. The method of claim 1, wherein the transform function combines the predicted output and the control output via one or more operations including a union operation.
  - 8. The method of claim 1, wherein the predicted control output comprises a signal causing the robotic apparatus to execute a portion of the maneuver.
  - 9. The method of claim 1, wherein the transform function provides the predicted control output responsive to the control output comprising a zero signal, the zero signal corresponding to a base state of the control output.
  - 10. The method of claim 9, wherein:
    - the transform function provides the control output responsive to the predicted control output comprising the zero signal;
      
      the control output, the combined output, and the predicted control output each comprise a spiking signal characterized by spike rate;
      
      the zero signal corresponds to a base spike rate; and
      
      a non-zero signal characterized by a spike rate substantially different from the base spike rate.
  - 11. The method of claim 1, wherein the transform function is characterized by a delay parameter such that the combined output at a first time instance is based on the control output at a second time instance, the second time instance preceding the first time instance by a current value of the delay parameter.
  - 12. The method of claim 1, wherein:
    - the reinforcement learning process is based on a network of computerized neurons adapted in accordance with the sensory input and the reinforcement signal;
      
      multiple ones of the computerized neurons are interconnected by connections characterized by connection efficacy; and
      
      the adaptation comprises adapting the connection efficacy of individual connections based on the sensory input and the reinforcement signal.
  - 13. The method of claim 1, wherein:
    - the supervised learning process is based on a network of computerized neurons adapted in accordance with the sensory input and the teaching signal;
      
      multiple ones of the computerized neurons are interconnected by connections characterized by connection efficacy; and
      
      the supervised learning process adaptation comprises adapting the connection efficacy of individual connections based on the sensory input and the teaching signal.
  - 14. The method of claim 13, wherein:
    - the supervised learning process is be updated at time intervals; and
      
      the adaptation is based on an error measure between (i) the predicted output generated at a given time instance and (ii) the teaching signal determined at another given time instance prior to the given time instance, the given time instance and the other given time instance separated by one of the time intervals.

15. A computerized controller apparatus of a robot, the apparatus comprising:
- a controller block;
  
  a predictor block; and
  
  one or more processors configured to execute computer program modules to perform a method of transferring information related to execution of a control task associated with a sensory context by the robot from the controller block to the predictor block, the method comprising;
  
  configuring the predictor block to operate in accordance with a supervised learning process based on a teaching input, the teaching input being provided by the control block based on a reinforcement learning process configured to be adapted based on the sensory context and a reinforcement signal, the reinforcement learning process adaptation being configured to occur during one or more trials effectuated prior to the provision of the teaching input; and
  
  based on the sensory context, causing the predictor block to generate a predicted control output that causes the execution of the control task.
- View Dependent Claims (16, 17)
- - 16. The apparatus of claim 15, wherein the reinforcement learning process adaptation is configured to cause generation of a control output by the control block prior to the provision of the teaching input, the control output configured to cause the execution of the control task.
  - 17. The apparatus of claim 16, wherein the predicted control output generation is based on an adaptation of the supervised learning process responsive to the teaching input, the adaptation of the supervised learning process effectuated during two or more successive training epochs such that there exists at least one epoch of the two or more training epochs wherein output of the predictor block is incapable of causing the execution of the control task.

18. A computerized robotic control apparatus, comprising:
- one or more processors configured to by machine-readable instructions to;
  
  determine a teaching signal based on a sensory input, the teaching signal conveying information associated with a target action consistent with the sensory input, the sensory input being indicative of at least one object in an environment of the robotic apparatus;
  
  determine a control signal based on the sensory input, the control signal conveying information associated with the target action;
  
  determine a predicted control signal based on the sensory input and the teaching signal, the predicted control conveying information associated with the target action; and
  
  combine the control signal and the predicted control signal into a combined control output, the combined control output causing the robotic apparatus to execute a maneuver, the target action comprising the maneuver.
- View Dependent Claims (19, 20)
- - 19. The apparatus of claim 18, wherein the one or more processors are further configured by machine-readable instructions to:
    - adapt a supervised learning process based on the sensory input and the teaching signal; and
      
      determine the teaching signal and the control signal in accordance with a reinforcement learning process based on the sensory input and a reinforcement signal provided by an external agent, the reinforcement learning process being configured to cause the determination of the control signal, the reinforcement signal being based on a performance measure associated with the reinforcement learning process.
  - 20. The apparatus of claim 19, wherein the external agent is either a human operator or a computerized apparatus configured to generate the reinforcement signal based on a performance measure associated with the execution of the maneuver.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Brain Corporation
Original Assignee
Brain Corporation
Inventors
Ponulak, Filip, Passot, Jean-Baptiste, Izhikevich, Eugene, Coenen, Olivier
Primary Examiner(s)
Tran, Khoi
Assistant Examiner(s)
MOTT, ADAM R

Application Number

US13/866,975
Time in Patent Office

725 Days
Field of Search

700/250
US Class Current

700/250
CPC Class Codes

B25J 9/161   Hardware, e.g. neural netwo...

B25J 9/163   learning, adaptive, model b...

G05B 13/0265   the criterion being a learn...

G06N 20/00   Machine learning

G06N 3/008   based on physical entities ...

G06N 3/02   Neural networks

G06N 3/049   Temporal neural networks, e...

Apparatus and methods for reinforcement-guided supervised learning

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus and methods for reinforcement-guided supervised learning

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links