APPARATUS AND METHODS FOR CONTROL OF ROBOT ACTIONS BASED ON CORRECTIVE USER INPUTS

US 20150217449A1
Filed: 02/03/2014
Published: 08/06/2015
Est. Priority Date: 02/03/2014
Status: Active Grant

First Claim

Patent Images

1. A method for controlling actions of robots, the method comprising:

identifying, at a device that includes a processor, a first context-variable value for a context variable detected by a robot at a sensory-detection time;

accessing, at the device, a policy that maps the context variable to a robot action variable;

determining that a first robot action characterized by a first value of the robot action variable was performed at an action time in response to detection of the first context-variable value, the first robot action being in accordance with application of the policy;

determining that a user input was received at an input time corresponding to the action time, wherein user input data derived from the user input at least partly defines a corrective command that specifies a corrective robot action for physical performance, and the user input being indicative of at least partial dissatisfaction with the robot action;

modifying the policy based on the correction command and the context-variable value; and

causing the modified policy to be used todetermine a second value of the robot action variable based on a second context-variable value for the context variable detected at a second sensory-detection time; and

initiate performance of a second robot action performance in accordance with the second value of the action variable.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Robots have the capacity to perform a broad range of useful tasks, such as factory automation, cleaning, delivery, assistive care, environmental monitoring and entertainment. Enabling a robot to perform a new task in a new environment typically requires a large amount of new software to be written, often by a team of experts. It would be valuable if future technology could empower people, who may have limited or no understanding of software coding, to train robots to perform custom tasks. Some implementations of the present invention provide methods and systems that respond to users'"'"' corrective commands to generate and refine a policy for determining appropriate actions based on sensor-data input. Upon completion of learning, the system can generate control commands by deriving them from the sensory data. Using the learned control policy, the robot can behave autonomously.

225 Citations

20 Claims

1. A method for controlling actions of robots, the method comprising:
- identifying, at a device that includes a processor, a first context-variable value for a context variable detected by a robot at a sensory-detection time;
  
  accessing, at the device, a policy that maps the context variable to a robot action variable;
  
  determining that a first robot action characterized by a first value of the robot action variable was performed at an action time in response to detection of the first context-variable value, the first robot action being in accordance with application of the policy;
  
  determining that a user input was received at an input time corresponding to the action time, wherein user input data derived from the user input at least partly defines a corrective command that specifies a corrective robot action for physical performance, and the user input being indicative of at least partial dissatisfaction with the robot action;
  
  modifying the policy based on the correction command and the context-variable value; and
  
  causing the modified policy to be used todetermine a second value of the robot action variable based on a second context-variable value for the context variable detected at a second sensory-detection time; and
  
  initiate performance of a second robot action performance in accordance with the second value of the action variable.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, further comprising:
    - identifying a third context-variable value for the context variable, the third context-variable value being detected at a third sensory-detection time that is after the before the third sensory-detection time;
      
      determining that the robot performed a third action in response to the third context-variable value, the third action be in accordance with application of the accessed policy; and
      
      inferring that the third action was satisfactory based on a lack of input data at least partly defining a correction command corresponding to the third action,wherein the modification of the policy is further based on the third context-variable value.
  - 3. The method of claim 1, further comprising:
    - identifying initial user input data derived from an initial user input received, the initial user input data at least partly defining a command that specifies an initial robot action for a robot to physically perform;
      
      identifying an initial context-variable value for a context variable detected by the robot at an initial sensory-detection time that corresponds to the initial input time; and
      
      determining the accessed policy based on the command and the first context-variable value for the context variable.
  - 4. The method of claim 1, further comprising:
    - determining the first value of the robot action variable based on the first context-variable value for the context variable; and
      
      initiating the robot action in accordance with the first value of the robot action variable.
  - 5. The method of claim 1, wherein the policy is modified using a learning model.
  - 6. The method of claim 1, wherein the corrective action is indicative of a magnitude of action.
  - 7. The method of claim 1, wherein the robot includes the device and further includes a motor used to perform at least part of the first robot action or the second robot action.
  - 8. The method of claim 1, wherein the user input includes input received at an interface at a user device remote from the robot.

9. A system, comprising:
- one or more data processors; and
  
  a non-transitory computer readable storage medium containing instructions which when executed on the one or more data processors, cause the processor to perform operations including;
  
  identifying a first context-variable value for a context variable detected by a robot at a sensory-detection time;
  
  accessing a policy that maps the context variable to a robot action variable;
  
  determining that a robot action characterized by a first value of the robot action variable was performed at an action time in response to detection of the first context-variable value, the first robot action being in accordance with application of the policy;
  
  determining that a user input was received at an input time corresponding to the action time, wherein user input data derived from the user input at least partly defines a corrective command that specifies a corrective robot action for physical performance, and the user input being indicative of at least partial dissatisfaction with the robot action;
  
  modifying the policy based on the correction command and the context-variable value; and
  
  causing the modified policy to be used todetermine a second value of the robot action variable based on a second context-variable value for the context variable detected at a second sensory-detection time; and
  
  initiate performance of a second robot action performance in accordance with the second value of the action variable.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9, wherein the operations further comprise:
    - identifying a third context-variable value for the context variable, the third context-variable value being detected at a third sensory-detection time that is after the before the third sensory-detection time;
      
      determining that the robot performed a third action in response to the third context-variable value, the third action be in accordance with application of the accessed policy; and
      
      inferring that the third action was satisfactory based on a lack of input data at least partly defining a correction command corresponding to the third action,wherein the modification of the policy is further based on the third context-variable value.
  - 11. The system of claim 9, wherein the operations further comprise:
    - identifying initial user input data derived from an initial user input received, the initial user input data at least partly defining a command that specifies an initial robot action for a robot to physically perform;
      
      identifying an initial context-variable value for a context variable detected by the robot at sensory-detection time that corresponds to the initial input time; and
      
      determining the accessed policy based on the command and the first context-variable value for the context variable.
  - 12. The system of claim 9, wherein the operations further comprise:
    - determining the first value of the robot action variable based on the first context-variable value for the context variable; and
      
      initiating the robot action in accordance with the first value of the robot action variable.
  - 13. The system of claim 9, wherein the policy is modified using a learning model.
  - 14. The system of claim 9, wherein the corrective action is indicative of a magnitude of action.
  - 15. The system of claim 9, wherein the robot includes the computing system and further includes a motor used to perform at least part of the first robot action or the second robot action.
  - 16. The system of claim 9, wherein the user input includes input received at an interface at a user device remote from the computing system.

17. A computer-program product tangibly embodied in a non-transitory machine-readable storage medium, including instructions configured to cause one or more data processors to perform operations includingidentifying a first context-variable value for a context variable detected by a robot at a sensory-detection time;
- accessing a policy that maps the context variable to a robot action variable;
  
  determining that a robot action characterized by a first value of the robot action variable was performed at an action time in response to detection of the first context-variable value, the first robot action being in accordance with application of the policy;
  
  determining that a user input was received at an input time corresponding to the action time, wherein user input data derived from the user input at least partly defines a corrective command that specifies a corrective robot action for physical performance, and the user input being indicative of at least partial dissatisfaction with the robot action;
  
  modifying the policy based on the correction command and the context-variable value; and
  
  causing the modified policy to be used todetermine a second value of the robot action variable based on a second context-variable value for the context variable detected at a second sensory-detection time; and
  
  initiate performance of a second robot action performance in accordance with the second value of the action variable.
- View Dependent Claims (18, 19, 20)
- - 18. The computer-program product of claim 17, wherein the operations further comprise:
    - identifying a third context-variable value for the context variable, the third context-variable value being detected at a third sensory-detection time that is after the before the third sensory-detection time;
      
      determining that the robot performed a third action in response to the third context-variable value, the third action be in accordance with application of the accessed policy; and
      
      inferring that the third action was satisfactory based on a lack of input data at least partly defining a correction command corresponding to the third action,wherein the modification of the policy is further based on the third context-variable value.
  - 19. The computer-program product of claim 17, wherein the operations further comprise:
    - identifying initial user input data derived from an initial user input received, the initial user input data at least partly defining a command that specifies an initial robot action for a robot to physically perform;
      
      identifying an initial context-variable value for a context variable detected by the robot at an initial sensory-detection time that corresponds to the initial input time; and
      
      determining the accessed policy based on the command and the first context-variable value for the context variable.
  - 20. The computer-program product of claim 17, wherein the operations further comprise:
    - determining the first value of the robot action variable based on the first context-variable value for the context variable; and
      
      initiating the robot action in accordance with the first value of the robot action variable.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Brain Corporation
Original Assignee
Brain Corporation
Inventors
Passot, Jean-Baptiste, Laurent, Patryk, Sinyavskiy, Oleg, Izhikevich, Eugene, Meier, Philip, Ibarz Gabardos, Borja, O'Connor, Peter

Granted Patent

US 9,358,685 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

B25J 9/1602   characterised by the contro...

B25J 9/161   Hardware, e.g. neural netwo...

B25J 9/163   learning, adaptive, model b...

B25J 9/1656   characterised by programmin...

G05B 13/027   using neural networks only

G05B 2219/40116   Learn by operator observati...

G05D 1/0033   by having the operator trac...

G05D 1/0088   characterized by the autono...

G06N 20/00   Machine learning

G06N 3/008   based on physical entities ...

Y10S 901/01   Mobile robot

Y10S 901/46   Sensing device

APPARATUS AND METHODS FOR CONTROL OF ROBOT ACTIONS BASED ON CORRECTIVE USER INPUTS

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

225 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

APPARATUS AND METHODS FOR CONTROL OF ROBOT ACTIONS BASED ON CORRECTIVE USER INPUTS

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

225 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links