Apparatus and methods for control of robot actions based on corrective user inputs
First Claim
1. A method for controlling a robot, comprising:
- receiving a stream of data from a sensor coupled to the robot, the stream of data comprising a first portion and a second portion, the first portion comprising sensor data collected from an environment of the robot during a first session and the second portion comprising sensor data collected from the environment during a second session;
receiving a first command from a user after the first portion of the stream of data, the first command corresponding to movement of the robot in the environment;
generating a policy based on the first portion and the first command, the policy comprising an algorithm configured to receive the stream of data and output trajectory of the robot in the environment;
determining receipt of a second command from the user after accessing the second portion of the stream data, the second command corresponding to a corrective command specifying a corrective action for the robot in the environment;
parsing the stream of data to identify pertinent second sensor data, pertinent second sensor data corresponding to identified features of the second portion of the stream data;
associating the second command with the pertinent second sensor data;
modifying the policy based on the correction command; and
causing the modified policy to be used to initiate performance of the robot based on relationship associated between the second command and the pertinent second sensor data.
1 Assignment
0 Petitions
Accused Products
Abstract
Robots have the capacity to perform a broad range of useful tasks, such as factory automation, cleaning, delivery, assistive care, environmental monitoring and entertainment. Enabling a robot to perform a new task in a new environment typically requires a large amount of new software to be written, often by a team of experts. It would be valuable if future technology could empower people, who may have limited or no understanding of software coding, to train robots to perform custom tasks. Some implementations of the present invention provide methods and systems that respond to users'"'"' corrective commands to generate and refine a policy for determining appropriate actions based on sensor-data input. Upon completion of learning, the system can generate control commands by deriving them from the sensory data. Using the learned control policy, the robot can behave autonomously.
64 Citations
20 Claims
-
1. A method for controlling a robot, comprising:
-
receiving a stream of data from a sensor coupled to the robot, the stream of data comprising a first portion and a second portion, the first portion comprising sensor data collected from an environment of the robot during a first session and the second portion comprising sensor data collected from the environment during a second session; receiving a first command from a user after the first portion of the stream of data, the first command corresponding to movement of the robot in the environment; generating a policy based on the first portion and the first command, the policy comprising an algorithm configured to receive the stream of data and output trajectory of the robot in the environment; determining receipt of a second command from the user after accessing the second portion of the stream data, the second command corresponding to a corrective command specifying a corrective action for the robot in the environment; parsing the stream of data to identify pertinent second sensor data, pertinent second sensor data corresponding to identified features of the second portion of the stream data; associating the second command with the pertinent second sensor data; modifying the policy based on the correction command; and causing the modified policy to be used to initiate performance of the robot based on relationship associated between the second command and the pertinent second sensor data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 18)
-
-
8. A system for controlling a robot, comprising:
-
one or more processing devices; and a non-transitory computer readable storage medium comprising computer readable instructions stored thereon which when executed by the one or more processing devices, causes the processing devices to, receive a stream of data from a sensor coupled to the robot, the stream of data comprising a first portion and a second portion, the first portion comprising sensor data collected from an environment of the robot during a first session and the second portion comprising sensor data collected from the environment during a second session, receive a first command from a user after the first portion of the stream of data, the first command corresponding to movement of the robot in the environment, generate a policy based on the first portion and the first command, the policy comprising an algorithm configured to receive the stream of data and output trajectory of the robot in the environment, parse the stream of data to identify pertinent second sensor data, pertinent second sensor data corresponding to identified features of the second portion of the stream data; determine receipt of a second command from the user after accessing the second portion of the stream data, the second command corresponding to a corrective command specifying a corrective action for the robot in the environment, associating the second command with the pertinent second sensor data, modifying the policy based on the correction command, and causing the modified policy to be used to initiate performance of the robot based on relationship associated between the second command and the pertinent second sensor data. - View Dependent Claims (9, 10, 11, 12, 13, 14, 19)
-
-
15. A non-transitory machine-readable storage medium comprising computer readable instructions stored thereon that when executed by a processing device configure the processing device to,
receive a stream of data from a sensor coupled to the robot, the stream of data comprising a first portion and a second portion, the first portion comprising sensor data collected from an environment of the robot during a first session and the second portion comprising sensor data collected from the environment during a second session, receive a first command from a user after the first portion of the stream of data, the first command corresponding to movement of the robot in the environment, generate a policy based on the first portion and the first command, the policy comprising an algorithm configured to receive the stream of data and output trajectory of the robot in the environment, determine receipt of a second command from the user after accessing the second portion of the stream data, the second command corresponding to a corrective command specifying a corrective action for the robot in the environment; -
parse the stream of data to identify pertinent second sensor data, pertinent second sensor data corresponding to identified features of the second portion of the stream data; associating the second command with the pertinent second sensor data; modifying the policy based on the correction command; and causing the modified policy to be used to initiate performance of the robot based on relationship associated between the second command and the pertinent second sensor data. - View Dependent Claims (16, 17, 20)
-
Specification