Apparatus and methods for control of robot actions based on corrective user inputs
First Claim
1. A method for controlling actions of robots, the method comprising:
- identifying, at a device that includes a processor, a first context-variable value for a context variable detected by a robot at a first sensory-detection time;
accessing, at the device, a policy comprising one or more parameters configured to map the context variable to a robot action variable;
determining that a first robot action characterized by a first value of the robot action variable was performed at an action time in response to detection of the first context-variable value, the first robot action being in accordance with application of the policy;
determining that a user input was received at an input time corresponding to the action time, wherein user input data derived from the user input at least partly defines a corrective command that specifies a corrective robot action for physical performance, the user input being indicative of at least partial dissatisfaction with the first robot action;
modifying the policy based on the corrective command and the first context-variable value; and
causing the modified policy to be used to;
determine a second robot action characterized by a second value of the robot action variable based on a second context-variable value for the context variable detected at a second sensory-detection time; and
initiate performance of the second robot action in accordance with the second value of the robot action variable.
2 Assignments
0 Petitions
Accused Products
Abstract
Robots have the capacity to perform a broad range of useful tasks, such as factory automation, cleaning, delivery, assistive care, environmental monitoring and entertainment. Enabling a robot to perform a new task in a new environment typically requires a large amount of new software to be written, often by a team of experts. It would be valuable if future technology could empower people, who may have limited or no understanding of software coding, to train robots to perform custom tasks. Some implementations of the present invention provide methods and systems that respond to users'"'"' corrective commands to generate and refine a policy for determining appropriate actions based on sensor-data input. Upon completion of learning, the system can generate control commands by deriving them from the sensory data. Using the learned control policy, the robot can behave autonomously.
-
Citations
20 Claims
-
1. A method for controlling actions of robots, the method comprising:
-
identifying, at a device that includes a processor, a first context-variable value for a context variable detected by a robot at a first sensory-detection time; accessing, at the device, a policy comprising one or more parameters configured to map the context variable to a robot action variable; determining that a first robot action characterized by a first value of the robot action variable was performed at an action time in response to detection of the first context-variable value, the first robot action being in accordance with application of the policy; determining that a user input was received at an input time corresponding to the action time, wherein user input data derived from the user input at least partly defines a corrective command that specifies a corrective robot action for physical performance, the user input being indicative of at least partial dissatisfaction with the first robot action; modifying the policy based on the corrective command and the first context-variable value; and causing the modified policy to be used to; determine a second robot action characterized by a second value of the robot action variable based on a second context-variable value for the context variable detected at a second sensory-detection time; and initiate performance of the second robot action in accordance with the second value of the robot action variable. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system, comprising:
-
one or more data processors; and a non-transitory computer readable storage medium containing instructions which when executed on the one or more data processors, cause the processor to; identify a first context-variable value for a context variable detected by a robot at a first sensory-detection time; access a policy comprising one or more parameters configured to map the context variable to a robot action variable; determine that a first robot action characterized by a first value of the robot action variable was performed at an action time in response to detection of the first context-variable value, the first robot action being in accordance with application of the policy; determine that a user input was received at an input time configured to correspond to the action time, wherein user input data derived from the user input at least partly defines a corrective command that specifies a corrective robot action for physical performance, the user input being indicative of at least partial dissatisfaction with the first robot action, wherein the corrective command defined by the user input data is configured to minimize an error associated with the robot action; modify the policy based on the corrective command and the st context-variable value; and cause the modified policy to be used to; determine a second robot action characterized by a second value of the robot action variable based on a second context-variable value for the context variable detected at a second sensory-detection time; and initiate performance of the second robot action in accordance with the second value of the robot action variable. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer-program product tangibly embodied in a non-transitory machine-readable storage medium, including instructions configured to cause one or more data processors to:
-
identify a first context-variable value for a context variable detected by a robot at a first sensory-detection time; access a policy comprising one or more parameters configured to map the context variable to a robot action variable; determine that a first robot action characterized by a first value of the robot action variable was performed at an action time in response to detection of the first context-variable value, the first robot action being in accordance with application of the policy; determine that a user input was received at an input time corresponding to the action time, wherein user input data derived from the user input at least partly defines a corrective command that specifies a corrective robot action for physical performance, the user input being indicative of at least partial dissatisfaction with the first robot action; modify the policy based on the corrective command and the first context-variable value; and cause the modified policy to be used to; determine a second robot action characterized by a second value of the robot action variable based on a second context-variable value for the context variable detected at a second sensory-detection time; and initiate performance of the second robot action in accordance with the second value of the robot action variable; wherein the second value of the robot action variable comprises a combination of the first robot action and the corrective action defined by the user input data, the combination being configured to result in a desired robot action. - View Dependent Claims (18, 19, 20)
-
Specification