Apparatus and methods for online training of robots

US 9,463,571 B2
Filed: 11/01/2013
Issued: 10/11/2016
Est. Priority Date: 11/01/2013
Status: Expired due to Fees

First Claim

Patent Images

1. A robotic apparatus, comprising:

a controllable actuator;

a sensor module configured to provide information related to an environment surrounding the robotic apparatus; and

an adaptive controller configured to produce a control instruction for the controllable actuator in accordance with the information provided by the sensor module, the control instruction being configured to cause the robotic apparatus to execute a target task;

wherein;

execution of the target task is characterized by the robotic apparatus traversing a trajectory of a first trajectory and a second trajectory;

the first trajectory and the second trajectory each having at least one different parameter associated with the environment;

the adaptive controller is operable in accordance with a supervised learning process configured based on a training signal and a plurality of trials;

at a given trial of the plurality of trials, the control instruction is configured to cause the robot to traverse one of the first trajectory and the second trajectory;

the training signal is generated based on the control instruction;

the training signal is configured to strengthen a trajectory selection by the controller with an effectiveness value such that, based on one of the first and second trajectory being selected for a first trial, the selected one of the first and second trajectory is more likely to be selected during one or more trials subsequent to the first trial; and

the effectiveness value of the training signal on the training process is reduced after a threshold number of trials of the plurality of trials.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Robotic devices may be trained by a user guiding the robot along a target trajectory using a correction signal. A robotic device may comprise an adaptive controller configured to generate control commands based on one or more of the trainer input, sensory input, and/or performance measure. Training may comprise a plurality of trials. During an initial portion of a trial, the trainer may observe robot'"'"'s operation and refrain from providing the training input to the robot. Upon observing a discrepancy between the target behavior and the actual behavior during the initial trial portion, the trainer may provide a teaching input (e.g., a correction signal) configured to affect robot'"'"'s trajectory during subsequent trials. Upon completing a sufficient number of trials, the robot may be capable of navigating the trajectory in absence of the training input.

325 Citations

20 Claims

1. A robotic apparatus, comprising:
- a controllable actuator;
  
  a sensor module configured to provide information related to an environment surrounding the robotic apparatus; and
  
  an adaptive controller configured to produce a control instruction for the controllable actuator in accordance with the information provided by the sensor module, the control instruction being configured to cause the robotic apparatus to execute a target task;
  
  wherein;
  
  execution of the target task is characterized by the robotic apparatus traversing a trajectory of a first trajectory and a second trajectory;
  
  the first trajectory and the second trajectory each having at least one different parameter associated with the environment;
  
  the adaptive controller is operable in accordance with a supervised learning process configured based on a training signal and a plurality of trials;
  
  at a given trial of the plurality of trials, the control instruction is configured to cause the robot to traverse one of the first trajectory and the second trajectory;
  
  the training signal is generated based on the control instruction;
  
  the training signal is configured to strengthen a trajectory selection by the controller with an effectiveness value such that, based on one of the first and second trajectory being selected for a first trial, the selected one of the first and second trajectory is more likely to be selected during one or more trials subsequent to the first trial; and
  
  the effectiveness value of the training signal on the training process is reduced after a threshold number of trials of the plurality of trials.

2. An adaptive controller apparatus, comprising:
- one or more processors configured to execute computer program instructions that, when executed, cause a robot to;
  
  at a first time instance, execute a first action in accordance with a sensory context and a random choice;
  
  at a second time instance subsequent to the first time instance, determine whether to execute the first action based on the sensory context and a teaching input received during the first time instance, the teaching input being received based on the first action in accordance with the sensory context and the random choice; and
  
  execute the first action in accordance with the determination;
  
  wherein;
  
  a target task comprises at least the first action; and
  
  the teaching input is configured to increase or decrease a probability of execution of the first action, the teaching input having an effectiveness value determined from the execution of the first action at one or more time instances, where the effectiveness value is reduced after a threshold number of the one or more time instances.
- View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 3. The adaptive controller apparatus of claim 2, further comprising computer program instructions that, when executed, cause the robot to:
    - at a given time instance, determine whether to execute one of the first action or a second action;
      
      wherein the execution of the first action at the given time instance is configured to increase the probability of execution of the first action at a subsequent time instance.
  - 4. The adaptive controller apparatus of claim 3, wherein the probability of execution of the first action is increased relative to a probability of execution of the second action at the second time instance.
  - 5. The adaptive controller apparatus of claim 3, wherein the teaching input is configured to reduce a probability of the robot executing a composite action at the second time instance, the composite action comprising the first action and the second action.
  - 6. The adaptive controller apparatus of claim 2, further comprising a computer-readable medium comprising a plurality of instructions that, when executed, cause the robot to:
    - receive a first control signal and a second control signal via a supervised learning process;
      
      wherein the first action execution at the first time instance and the second time instance is based on the first control signal and the second control signal, respectively, received by the supervised learning process; and
      
      responsive to the receipt of the teaching input, associate a sensory context to the first action.
  - 7. The adaptive controller apparatus of claim 6, wherein:
    - the supervised learning process is configured based on a neuron network comprising a plurality of neurons communicating via a plurality of connections;
      
      one or more individual connections of the plurality of connections provide an input into a given one of the plurality of neurons that is characterized by a connection efficacy configured to affect operation of the given one of the plurality of neurons; and
      
      the association of the sensory context to the first action comprises an adjustment of the connection efficacy based on the teaching input and the first control signal.
  - 8. The adaptive controller apparatus of claim 2, wherein:
    - the first action and a second action are characterized by a different value of a state parameter associated with an environment; and
      
      the state parameter is selected from a group consisting of a spatial coordinate, a robot'"'"'s velocity, a robot'"'"'s orientation, and a robot'"'"'s position.
  - 9. The adaptive controller apparatus of claim 2, wherein:
    - the adaptive controller apparatus is embodied in the robot; and
      
      responsive to the sensory context comprising a representation of an obstacle, the target task comprises an avoidance maneuver executed by the robot; and
      
      responsive to the sensory context comprising a representation of a target, the target task comprises an approach maneuver executed by the robot.
  - 10. The adaptive controller apparatus of claim 2, wherein:
    - the execution of the first action is configured based on a control signal, the control signal being updated at time intervals shorter than one second; and
      
      the first time instance and the second time instance are separated by an interval that is no shorter than one second.
  - 11. The adaptive controller apparatus of claim 2, wherein the teaching input is provided by a computerized entity via a wireless interface.
  - 12. The adaptive controller apparatus of claim 2, wherein:
    - the robot comprises an autonomous platform;
      
      the controller apparatus is embodied on the autonomous platform; and
      
      the teaching input is provided by a computerized module comprising a proximity indicator configured to generate a proximity indicator signal based on an object being within a given range from the platform.
  - 13. The adaptive controller apparatus of claim 2, wherein:
    - the adaptive controller apparatus is operable in accordance with a supervised learning process configured based on the teaching signal;
      
      the sensory context comprises information indicative of an object within an environment of the robot;
      
      the execution of the first action is based on a first predicted control output of the supervised learning process configured in accordance with the sensory context; and
      
      execution of a second action is based on a second predicted control output of the supervised learning process configured in accordance with the sensory context and the teaching input.
  - 14. The adaptive controller apparatus of claim 13, wherein:
    - the first and the second predicted control output are determined based on an output of an adaptive predictor module operable in accordance with the supervised learning process configured in accordance with the teaching input;
      
      the supervised learning process is configured to combine the teaching signal with the first predicted control output at the first time instance to produce a combined signal; and
      
      the teaching input at the second time instance is configured based on the combined signal.
  - 15. The adaptive controller apparatus of claim 14, wherein:
    - the supervised learning process is configured based on a backward propagation of an error; and
      
      the combined signal is determined based on a transform function configured based on a union operation.
  - 16. The adaptive controller apparatus of claim 14, wherein:
    - the combined signal is determined based on a transform function configured based on one or more operations including an additive operation characterized by a first weight and a second weight;
      
      the first weight is configured to be applied to a predictor output; and
      
      the second weight is configured to be applied to the teaching input.
  - 17. The adaptive controller apparatus of claim 16, wherein:
    - a value of the first weight at the first time instance is greater than the value of the first weight at the second time instance; and
      
      a value of the second weight at the first time instance is lower than the value of the second weight at the second time instance.
  - 18. The adaptive controller apparatus of claim 2, wherein:
    - the robot comprises a mobile platform;
      
      the adaptive controller apparatus is configured to be embodied on the mobile platform; and
      
      the sensory context is based on a visual input provided by a camera disposed on the mobile platform.

19. A method of increasing a probability of action execution by a robotic apparatus, comprising:
- receiving a sensory context from a sensor;
  
  at a first time instance, executing a first action with the robotic apparatus in accordance with the sensory context;
  
  at a second time instance subsequent to the first time instance, determining with an adaptive controller whether to execute the first action based on the sensory context received from the sensor and a teaching input received from a user interface during the first time instance; and
  
  executing the first action with the robotic apparatus in accordance with the determination of the adaptive controller;
  
  wherein;
  
  a target task comprises at least the first action; and
  
  increasing or decreasing a probability of execution of the first action is based on the teaching input, the teaching input having an effectiveness value determined by the adaptive controller from the execution of the first action at one or more time instances, where the effectiveness value is reduced after a threshold number of the one or more time instances.
- View Dependent Claims (20)
- - 20. The method of claim 19, wherein the determining whether to execute the first action further comprises determining whether to execute a second action by the adaptive controller;
    - and the method further comprises executing the second action with the robotic apparatus in accordance with the determination of whether to execute the second action.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Brain Corporation
Original Assignee
Brain Corporation
Inventors
Passot, Jean-Baptiste, Izhikevich, Eugene, Sinyavskiy, Oleg
Primary Examiner(s)
Tran, Khoi
Assistant Examiner(s)
RINK, RYAN J

Application Number

US14/070,114
Publication Number

US 20150127149A1
Time in Patent Office

1,075 Days
Field of Search
US Class Current

1/1
CPC Class Codes

B25J 9/163   learning, adaptive, model b...

G05B 2219/33056   Reinforcement learning, age...

G05B 2219/40499   Reinforcement learning algo...

G05D 1/0088   characterized by the autono...

G05D 1/0221   involving a learning process

G06N 20/00   Machine learning

G06N 3/008   based on physical entities ...

G06N 3/049   Temporal neural networks, e...

Y10S 901/03   Teaching system

Apparatus and methods for online training of robots

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

325 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus and methods for online training of robots

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

325 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links