APPARATUS AND METHODS FOR OPERATING ROBOTIC DEVICES USING SELECTIVE STATE SPACE TRAINING

US 20150127155A1
Filed: 11/01/2013
Published: 05/07/2015
Est. Priority Date: 06/02/2011
Status: Active Grant

First Claim

Patent Images

1. A method of operating a robotic controller apparatus, the method comprising:

determining a current controller performance associated with performing a target task;

determining a difficult portion of a target trajectory associated with the target task, the difficult portion characterized by an extent of a state space; and

providing a training input for navigating the difficult portion, the training input configured to transition the current performance towards the target trajectory;

wherein;

the difficult portion of the target trajectory is determined based at least on the current performance being outside a range from the target trajectory;

the state space is associated with performing of the target task by the controller; and

performing by the controller of a portion of the target task outside the extent is configured based on autonomous controller operation.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Apparatus and methods for training and controlling of e.g., robotic devices. In one implementation, a robot may be utilized to perform a target task characterized by a target trajectory. The robot may be trained by a user using supervised learning. The user may interface to the robot, such as via a control apparatus configured to provide a teaching signal to the robot. The robot may comprise an adaptive controller comprising a neuron network, which may be configured to generate actuator control commands based on the user input and output of the learning process. During one or more learning trials, the controller may be trained to navigate a portion of the target trajectory. Individual trajectory portions may be trained during separate training trials. Some portions may be associated with robot executing complex actions and may require additional training trials and/or more dense training input compared to simpler trajectory actions.

84 Citations

View as Search Results

25 Claims

1. A method of operating a robotic controller apparatus, the method comprising:
- determining a current controller performance associated with performing a target task;
  
  determining a difficult portion of a target trajectory associated with the target task, the difficult portion characterized by an extent of a state space; and
  
  providing a training input for navigating the difficult portion, the training input configured to transition the current performance towards the target trajectory;
  
  wherein;
  
  the difficult portion of the target trajectory is determined based at least on the current performance being outside a range from the target trajectory;
  
  the state space is associated with performing of the target task by the controller; and
  
  performing by the controller of a portion of the target task outside the extent is configured based on autonomous controller operation.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein:
    - the controller is operable in accordance with a supervised learning process configured based on the teaching input, the learning process being adapted based on the current performance; and
      
      the navigating of the difficult portion is based at least in part on a combination of the teaching input and an output of the controller learning process.
  - 3. The method of claim 1, wherein:
    - the extent is characterized by a first dimension having a first value, and the state space is characterized by a second dimension having a second value; and
      
      the first value is less than one-half (½
      
      ) of the second value.
  - 4. The method of claim 3, wherein:
    - the first dimension and the second values each comprise a temporal value;
      
      the supervised learning process configured based on a plurality of training trials, individual trials having a duration associated therewith;
      
      navigating the difficult trajectory portion during a trial is characterized by the duration equal to the first value;
      
      navigating the full trajectory portion during a trial is characterized by the duration equal to the second value; and
      
      navigating the difficult trajectory portion in lieu of the full trajectory is configured to attain the target performance in a shorter time compared to navigation of the full trajectory.
  - 5. The method of claim 3, wherein:
    - performing of the target task by the controller comprises provision of a control signal by the controller to a robotic platform; and
      
      the first dimension is selected from the group consisting of spatial coordinate, velocity, acceleration, and orientation of the platform.
  - 6. The method of claim 3, wherein:
    - the difficult trajectory portion determination is based at least on the first dimension being outside a target range of at least one of the state space parameter.
  - 7. The method of claim 3, wherein:
    - the controller is operable in accordance with a supervised learning process configured based at least on the teaching input and a plurality of training trials, the learning process being adapted based on the current performance; and
      
      for a given trial of the plurality of trials, the training input is configured to reduce the first dimension during a subsequent trial.
  - 8. The method of claim 7, wherein:
    - the first dimension reduction is configured to eliminate the extent so as to enable autonomous operation of the controller, the autonomous controller operation characterized by the controller being capable of navigating the target trajectory with the target performance in absence of the teaching input.
  - 9. The method of claim 1, wherein:
    - the controller is operable in accordance with a supervised learning process configured based on the teaching input and a plurality of training trials, the learning process being adapted based on the current performance; and
      
      the difficult trajectory portion determination is based at least on a number of trials within the plurality of trials required to attain the target performance.

10. An adaptive controller apparatus comprising a plurality of computer readable instructions configured to, when executed, cause performing of a target task by at least:
- during a first training trial, determining a predicted signal configured in accordance with a sensory input, the predicted signal configured to cause execution of an action associated with the target task, the action execution being characterized by a first performance;
  
  during a second training trial, based on a teaching input and the predicted signal, determining a combined signal configured to cause execution of the action, the action execution during the second training trial being characterized by a second performance; and
  
  adjusting a learning parameter of the controller based on the first performance and the second performance.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The apparatus of claim 10, wherein:
    - the execution of the target task comprises execution of the action and at least one other action;
      
      the adjusting of the learning parameter is configured to enable the controller to determine, during a third training trial, another predicted signal configured in accordance with the sensory input; and
      
      the execution, based on the another predicted signal, of the action during the third training trial is characterized by a third performance that is closer to the target task compared to the first performance.
  - 12. The apparatus of claim 11, wherein:
    - execution of the target task the target task is characterized by a target trajectory in a state space;
      
      execution of the action is characterized by a portion of the target trajectory having a state space extent associated therewith; and
      
      the state space extent occupies a minority fraction of the state space.
  - 13. The apparatus of claim 11, wherein:
    - the second trial is configured to occur subsequent to the first trial and prior to the third trial; and
      
      the combination is effectuated based at least on a transform function comprising one or more operations including an additive operation.
  - 14. The apparatus of claim 11, wherein:
    - the combination is effectuated based at least on a transform function comprising one or more operations including a union operation, andthe transform function is configured based at least on a gating signal configured to toggle a state of the transform function between;
      
      (i) a transform state configured to produce the combined signal; and
      
      (ii) a bypass state configured to produce the transform function output consisting of the teaching input and independent of the predicted signal.
  - 15. The apparatus of claim 14, wherein:
    - the transform function bypass state is effectuated responsive to one or more of (a) a zero weight being assigned to the predicted signal, or (b) a zero signal being assigned to the predicted signal, the zero signal comprising a pre-defined value.
  - 16. The apparatus of claim 11, wherein:
    - the predicted control output is generated based at least on a learning process configured to be adapted at time intervals in accordance with the sensory input and a feedback; and
      
      the adaptation is based at least on an error measure between (i) the predicted signal generated at a given time interval and (ii) the feedback signal determined at another time interval prior to the given time interval.

17. A robotic apparatus comprising:
- a platform characterized by first and second degrees of freedom;
  
  a sensor module configured to provide information related to the platform'"'"'s environment; and
  
  an adaptive controller apparatus configured to determine first and second control signals to facilitate operation of the first and the second degrees of freedom, respectively;
  
  wherein;
  
  the first and the second control signals are configured to cause the platform to perform a target action;
  
  the first control signal is determined in accordance with the information and a teaching input;
  
  the second control signal is determined in an absence of the teaching input and in accordance with the information and a configuration of the controller, andthe configuration is determined based at least on an outcome of training of the controller to operate the second degree of freedom.
- View Dependent Claims (18, 19, 20, 21)
- - 18. The apparatus of claim 17, wherein:
    - the determination of the first control signal is effectuated based at least on a supervised learning process characterized by multiple iterations; and
      
      performance of the target action in accordance with the first control signal at a given iteration is characterized by a first performance.
  - 19. The apparatus of claim 18 wherein:
    - the adaptive controller is configured to modify the configuration based at least on the teaching input, thereby enabling the controller to produce another version of the first control signal at another iteration subsequent to the given iteration and in an absence of the teaching input; and
      
      performance of the target action in accordance with the another version of the first control signal is characterized by a second performance that is closer, relative the first performance, to a target performance associated with the target action.
  - 20. The apparatus of claim 19, wherein:
    - the training input is associated with the first degree of freedom operation; and
      
      a third performance associated with performing of the target task at the given iteration absent the training input is lower compared to the first performance.
  - 21. The apparatus of claim 18, wherein:
    - the target action is characterized by a trajectory having a duration associated therewith;
      
      provision of the training input is characterized by a time interval configured to be shorter as compared to the duration;
      
      the information comprises a characteristic of an object within the environment; and
      
      the target action is configured based on the characteristic of the object.

22. A method of optimizing the operation of a robotic controller apparatus, the method comprising:
- determining a current controller performance associated with performing a target task, the current performance being non-optimal for accomplishing the task; and
  
  for at least a selected first portion of a target trajectory associated with the target task, the first portion characterized by an extent of a state space, providing a training input that facilitates navigation of the first portion, the training input configured to transition the current performance towards the target trajectory;
  
  wherein the first portion of the target trajectory is selected based at least on the current performance not meeting at least one prescribed criterion with respect to the target trajectory.
- View Dependent Claims (23, 24, 25)
- - 23. The method of claim 22, wherein the at least one prescribed criterion comprises the current performance exceeding a disparity from, or range associated with, an acceptable performance.
  - 24. The method of claim 22, wherein a performance by the controller of a portion of the target task outside the extent is effectuated in the absence of the training input.
  - 25. The method of claim 22, wherein:
    - the controller is configured to be trained to perform the target task using multiple iterations; and
      
      for a given iteration of the multiple iterations, the selected first portion comprises a portion with a higher rate of non-optimal performance determined based on one or more prior iterations of the multiple iterations.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Brain Corporation
Original Assignee
Brain Corporation
Inventors
Passot, Jean-Baptiste, Sinyavskiy, Oleg, Izhikevich, Eugene

Granted Patent

US 9,566,710 B2
Time in Patent Office

Days
Field of Search
US Class Current

700/257
CPC Class Codes

B25J 9/161   Hardware, e.g. neural netwo...

B25J 9/163   learning, adaptive, model b...

G05B 2219/33034   Online learning, training

G05B 2219/39289   Adaptive ann controller

G05B 2219/39298   Trajectory learning

G06N 20/00   Machine learning

G06N 3/008   based on physical entities ...

G06N 3/049   Temporal neural networks, e...

G06N 3/08   Learning methods

Y10S 901/03   Teaching system

APPARATUS AND METHODS FOR OPERATING ROBOTIC DEVICES USING SELECTIVE STATE SPACE TRAINING

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

84 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

APPARATUS AND METHODS FOR OPERATING ROBOTIC DEVICES USING SELECTIVE STATE SPACE TRAINING

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

84 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links