Apparatus and methods for training of robotic control arbitration

US 9,579,789 B2
Filed: 09/27/2013
Issued: 02/28/2017
Est. Priority Date: 09/27/2013
Status: Active Grant

First Claim

Patent Images

1. A processor-implemented method of learning arbitration for two physical tasks by a controller of a robot, the method being performed by one or more processors configured to execute computer program modules, the method comprising:

during a given training trial of a plurality of trials;

receiving a control signal configured to indicate a simultaneous execution of two physical tasks by the robot;

selecting one of the two physical tasks;

evaluating an error measure determined based on a target physical task and an execution of the selected one of the two physical tasks by the robot, the two physical tasks comprising a first physical task and a second physical task;

based on the error measure being within a target range from a previous error measure obtained during a previous training trial of the plurality of trials and prior to the given training trial, receiving a reinforcement signal comprising information associated with the target physical task, and associating the target physical task to the selected one of the two physical tasks; and

during a subsequent training trial of a plurality of trials;

based on the reinforcement signal, determining an association between a sensory context and the target physical task, and when the association is determined, executing the target physical task via the robot based on (1) an occurrence of the sensory context after the given training trial during the subsequent training trial of the plurality of trials, and (2) an absence of receiving the reinforcement signal during the subsequent training trial.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Apparatus and methods for arbitration of control signals for robotic devices. A robotic device may comprise an adaptive controller comprising a plurality of predictors configured to provide multiple predicted control signals based on one or more of the teaching input, sensory input, and/or performance. The predicted control signals may be configured to cause two or more actions that may be in conflict with one another and/or utilize a shared resource. An arbitrator may be employed to select one of the actions. The selection process may utilize a WTA, reinforcement, and/or supervisory mechanisms in order to inhibit one or more predicted signals. The arbitrator output may comprise target state information that may be provided to the predictor block. Prior to arbitration, the predicted control signals may be combined with inputs provided by an external control entity in order to reduce learning time.

Citations

20 Claims

1. A processor-implemented method of learning arbitration for two physical tasks by a controller of a robot, the method being performed by one or more processors configured to execute computer program modules, the method comprising:
- during a given training trial of a plurality of trials;
  
  receiving a control signal configured to indicate a simultaneous execution of two physical tasks by the robot;
  
  selecting one of the two physical tasks;
  
  evaluating an error measure determined based on a target physical task and an execution of the selected one of the two physical tasks by the robot, the two physical tasks comprising a first physical task and a second physical task;
  
  based on the error measure being within a target range from a previous error measure obtained during a previous training trial of the plurality of trials and prior to the given training trial, receiving a reinforcement signal comprising information associated with the target physical task, and associating the target physical task to the selected one of the two physical tasks; and
  
  during a subsequent training trial of a plurality of trials;
  
  based on the reinforcement signal, determining an association between a sensory context and the target physical task, and when the association is determined, executing the target physical task via the robot based on (1) an occurrence of the sensory context after the given training trial during the subsequent training trial of the plurality of trials, and (2) an absence of receiving the reinforcement signal during the subsequent training trial.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. The method of claim 1, wherein:
    - execution of the first task is based on a first predicted control signal and execution of the second task is based on a second predicted control signal, the first predicted control signal and the second predicted control signal being determined based on the sensory context;
      
      the execution of the first task obtains a first outcome;
      
      the execution of the second task obtains a second outcome that is distinct from the first outcome; and
      
      the first predicted control signal and the second predicted control signal are both configured to activate a same controllable resource of the robot such that the executions of the first and second task are mutually exclusive.
  - 3. The method of claim 2, wherein:
    - the association between the sensory context and the target task comprises a basis for selecting the target task from the two tasks, the selecting the target task being based on a learning process characterized by a competition between (i) a first process associated with the first predicted control signal, and (ii) a second process associated with the second predicted control signal;
      
      a first selection of the first task as the target task is configured to oppose a second selection of the second task as the target task; and
      
      the reinforcement signal is configured to increase the competition.
  - 4. The method of claim 3, wherein:
    - the opposition is configured based on a selectivity range; and
      
      the target task corresponds to the one of the two tasks based on (i) one of the first and the second predicted control signals being within the selectivity range; and
      
      (ii) an other of the first and the second predicted control signals being outside the selectivity range.
  - 5. The method of claim 4, wherein:
    - the first and the second predicted control signals are each characterized by one or more of a signal time of occurrence, a signal magnitude, a signal frequency, or a signal phase;
      
      the selectivity range corresponds to a range of values of one or more of the signal time of occurrence, the signal magnitude, the signal frequency, or the signal phase being evaluated as a part of the determining the association; and
      
      the method further comprises increasing the opposition based on a reduction of the selectivity range.
  - 6. The method of claim 4, wherein the increasing the opposition results in the first selection of the first task of the two tasks based on one or more of:
    - (i) the first predicted control signal occurring prior to the second predicted control signal, and (ii) the first predicted control signal having a greater magnitude relative to the second predicted control signal.
  - 7. The method of claim 3, wherein:
    - the learning process comprises a reinforcement learning process configured to generate the reinforcement signal based on a comparison of the first task, the second task, and the target physical task;
      
      the reinforcement signal is configured to promote the first process relative the second process when the first task corresponds to the target physical task; and
      
      the reinforcement signal is configured to demote the first process relative the second process when the second task corresponds to the target physical task.
  - 8. The method of claim 7, wherein:
    - the two tasks comprise a target approach task and an obstacle avoidance task;
      
      the target task comprises the obstacle avoidance task; and
      
      the reinforcement signal is provided based on a collision indication associated with the robot colliding with an obstacle.
  - 9. The method of claim 3, wherein:
    - the learning process comprises a supervised learning process; and
      
      a supervisor signal associated with the supervised learning process is configured to increase a probability of one of the first process or the second process winning the competition over an other one of the first process or the second process responsive to one of the two tasks associated with either the first process or the second process corresponding to the target task.
  - 10. The method of claim 9, wherein:
    - for a number of training trials of the plurality of trials, the learning process is configured to cause execution of a task other than the target task; and
      
      the supervisor signal is configured to enable the selection of the target task from the two tasks during a training trial of the plurality of trials occurring subsequent to a last-performed training trial of the number of training trials.
  - 11. The method of claim 9, wherein:
    - the first and the second predicted control signals each comprise an output of a predictor module configured based on a reinforcement learning process; and
      
      the reinforcement learning process is configured based on the sensory context and the reinforcement signal configured based on another output of the predictor module determined at another training trial of the plurality of trials occurring prior to the given training trial.
  - 12. The method of claim 9, wherein:
    - the first and the second predicted control signals each comprise an output of a predictor module operable in accordance with a reinforcement learning process;
      
      the reinforcement learning process is configured based on the sensory context and a reinforcement signal configured based on an output of a combiner module determined at another training trial occurring prior to the given training trial; and
      
      the output of the combiner module is determined based on a combination of (i) another output of the predictor module determined at the another training trial; and
      
      (ii) a control input communicating information relating the target task.
  - 13. The method of claim 3, wherein individual control signals are outputs of a controller programmed in advance prior to the given training trial.
  - 14. The method of claim 2, wherein the execution of the first task is based on a combined output configured based on the reinforcement signal and the first predicted control signal, the combined output being characterized by a transform function.
  - 15. The method of claim 14, wherein:
    - the first predicted control signal is determined based on the plurality of trials, the plurality of trials involving the reinforcement signal;
      
      the reinforcement signal for a given trial is configured based on the combined output from a prior trial; and
      
      an error measure for the given trial is configured based on a difference between a predicted control output and the reinforcement signal.
  - 16. The method of claim 14, wherein the execution of the second task is based on another combined output configured based on the reinforcement signal and the second predicted control signal, the another combined output being configured based on the transform function.
  - 17. The method of claim 14, wherein the transform function comprises an overriding transformation configured such that for a non-zero teaching signal the combined output is configured regardless of the first predicted control signal.
  - 18. The method of claim 14, wherein the transform function comprises an additive transformation configured such that the combined output comprises a linear combination of the reinforcement signal and the first predicted control signal.
  - 19. The method of claim 14, further comprising:
    - based on the error measure, withdrawing the reinforcement signal from the combined output to produce the first predicted control signal; and
      
      providing the first predicted control output to the robot, the first predicted control output being capable of causing the execution of the first task by the robot;
      
      wherein the error measure for the given trial is configured based on a difference between the first predicted control output from the previous training trial and the reinforcement signal.

20. A computerized system for learning task arbitration by a robot, the system comprising:
- an interface configured to detect a reinforcement signal;
  
  a processing component; and
  
  a non-transitory memory configured to store a plurality of computer instructions that when executed by the processing component, are configured to cause the computerized system to;
  
  during a given training trial of a plurality of training trials;
  
  receive a control signal configured to indicate a simultaneous execution of two physical tasks by the robot;
  
  select one task of the two physical tasks based on a selection signal associated with the selected one task;
  
  determine an error measure based on a target physical task and an execution of the selected one task of the two physical tasks by the robot, the two physical tasks comprising a first physical task and a second physical task;
  
  based on the error measure being within a desired range from a previous error measure obtained during another training trial of the plurality of training trials and prior to the given training trial, evaluate the reinforcement signal comprising information associated with the target physical task, the target physical task being associated with one of the two physical tasks; and
  
  responsive to the evaluation of the reinforcement signal, determine an association between a sensory context and the target physical task, and execute the target physical task via the robot based on (1) an occurrence of the sensory context after the given training trial during a subsequent training trial of the plurality of training trials, (2) an absence of a receipt of the reinforcement signal during the subsequent training trial, and (3) the determined association.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Brain Corporation
Original Assignee
Brain Corporation
Inventors
Passot, Jean-Baptiste, Laurent, Patryk, Izhikevich, Eugene
Primary Examiner(s)
Tran, Khoi
Assistant Examiner(s)
RINK, RYAN J

Application Number

US14/040,520
Publication Number

US 20150094850A1
Time in Patent Office

1,250 Days
Field of Search
US Class Current

1/1
CPC Class Codes

B25J 9/163   learning, adaptive, model b...

G05B 2219/39271   Ann artificial neural netwo...

G05B 2219/39307   Multiple ann, trajectory co...

G05B 2219/40499   Reinforcement learning algo...

G06N 3/008   based on physical entities ...

G06N 3/049   Temporal neural networks, e...

Y10S 901/03   Teaching system

Apparatus and methods for training of robotic control arbitration

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus and methods for training of robotic control arbitration

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links