Apparatus and methods for online training of robots
First Claim
1. A robotic apparatus, comprising:
- a controllable actuator;
a sensor module configured to provide information related to an environment surrounding the robotic apparatus; and
an adaptive controller configured to produce a control instruction for the controllable actuator in accordance with the information provided by the sensor module, the control instruction being configured to cause the robotic apparatus to execute a target task;
wherein;
execution of the target task is characterized by the robotic apparatus traversing a trajectory of a first trajectory and a second trajectory;
the first trajectory and the second trajectory each having at least one different parameter associated with the environment;
the adaptive controller is operable in accordance with a supervised learning process configured based on a training signal and a plurality of trials;
at a given trial of the plurality of trials, the control instruction is configured to cause the robot to traverse one of the first trajectory and the second trajectory;
the training signal is generated based on the control instruction;
the training signal is configured to strengthen a trajectory selection by the controller with an effectiveness value such that, based on one of the first and second trajectory being selected for a first trial, the selected one of the first and second trajectory is more likely to be selected during one or more trials subsequent to the first trial; and
the effectiveness value of the training signal on the training process is reduced after a threshold number of trials of the plurality of trials.
1 Assignment
0 Petitions
Accused Products
Abstract
Robotic devices may be trained by a user guiding the robot along a target trajectory using a correction signal. A robotic device may comprise an adaptive controller configured to generate control commands based on one or more of the trainer input, sensory input, and/or performance measure. Training may comprise a plurality of trials. During an initial portion of a trial, the trainer may observe robot'"'"'s operation and refrain from providing the training input to the robot. Upon observing a discrepancy between the target behavior and the actual behavior during the initial trial portion, the trainer may provide a teaching input (e.g., a correction signal) configured to affect robot'"'"'s trajectory during subsequent trials. Upon completing a sufficient number of trials, the robot may be capable of navigating the trajectory in absence of the training input.
325 Citations
20 Claims
-
1. A robotic apparatus, comprising:
-
a controllable actuator; a sensor module configured to provide information related to an environment surrounding the robotic apparatus; and an adaptive controller configured to produce a control instruction for the controllable actuator in accordance with the information provided by the sensor module, the control instruction being configured to cause the robotic apparatus to execute a target task; wherein; execution of the target task is characterized by the robotic apparatus traversing a trajectory of a first trajectory and a second trajectory; the first trajectory and the second trajectory each having at least one different parameter associated with the environment; the adaptive controller is operable in accordance with a supervised learning process configured based on a training signal and a plurality of trials; at a given trial of the plurality of trials, the control instruction is configured to cause the robot to traverse one of the first trajectory and the second trajectory; the training signal is generated based on the control instruction; the training signal is configured to strengthen a trajectory selection by the controller with an effectiveness value such that, based on one of the first and second trajectory being selected for a first trial, the selected one of the first and second trajectory is more likely to be selected during one or more trials subsequent to the first trial; and the effectiveness value of the training signal on the training process is reduced after a threshold number of trials of the plurality of trials.
-
-
2. An adaptive controller apparatus, comprising:
one or more processors configured to execute computer program instructions that, when executed, cause a robot to; at a first time instance, execute a first action in accordance with a sensory context and a random choice; at a second time instance subsequent to the first time instance, determine whether to execute the first action based on the sensory context and a teaching input received during the first time instance, the teaching input being received based on the first action in accordance with the sensory context and the random choice; and execute the first action in accordance with the determination; wherein; a target task comprises at least the first action; and the teaching input is configured to increase or decrease a probability of execution of the first action, the teaching input having an effectiveness value determined from the execution of the first action at one or more time instances, where the effectiveness value is reduced after a threshold number of the one or more time instances. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
19. A method of increasing a probability of action execution by a robotic apparatus, comprising:
-
receiving a sensory context from a sensor; at a first time instance, executing a first action with the robotic apparatus in accordance with the sensory context; at a second time instance subsequent to the first time instance, determining with an adaptive controller whether to execute the first action based on the sensory context received from the sensor and a teaching input received from a user interface during the first time instance; and executing the first action with the robotic apparatus in accordance with the determination of the adaptive controller; wherein; a target task comprises at least the first action; and increasing or decreasing a probability of execution of the first action is based on the teaching input, the teaching input having an effectiveness value determined by the adaptive controller from the execution of the first action at one or more time instances, where the effectiveness value is reduced after a threshold number of the one or more time instances. - View Dependent Claims (20)
-
Specification