Apparatus and methods for control of robot actions based on corrective user inputs

US 10,843,338 B2
Filed: 05/03/2019
Issued: 11/24/2020
Est. Priority Date: 02/03/2014
Status: Active Grant

First Claim

Patent Images

1. A method for controlling a robot, comprising:

receiving a stream of data from a sensor coupled to the robot, the stream of data comprising a first portion and a second portion, the first portion comprising sensor data collected from an environment of the robot during a first session and the second portion comprising sensor data collected from the environment during a second session;

receiving a first command from a user after the first portion of the stream of data, the first command corresponding to movement of the robot in the environment;

generating a policy based on the first portion and the first command, the policy comprising an algorithm configured to receive the stream of data and output trajectory of the robot in the environment;

determining receipt of a second command from the user after accessing the second portion of the stream data, the second command corresponding to a corrective command specifying a corrective action for the robot in the environment;

parsing the stream of data to identify pertinent second sensor data, pertinent second sensor data corresponding to identified features of the second portion of the stream data;

associating the second command with the pertinent second sensor data;

modifying the policy based on the correction command; and

causing the modified policy to be used to initiate performance of the robot based on relationship associated between the second command and the pertinent second sensor data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Robots have the capacity to perform a broad range of useful tasks, such as factory automation, cleaning, delivery, assistive care, environmental monitoring and entertainment. Enabling a robot to perform a new task in a new environment typically requires a large amount of new software to be written, often by a team of experts. It would be valuable if future technology could empower people, who may have limited or no understanding of software coding, to train robots to perform custom tasks. Some implementations of the present invention provide methods and systems that respond to users'"'"' corrective commands to generate and refine a policy for determining appropriate actions based on sensor-data input. Upon completion of learning, the system can generate control commands by deriving them from the sensory data. Using the learned control policy, the robot can behave autonomously.

64 Citations

20 Claims

1. A method for controlling a robot, comprising:
- receiving a stream of data from a sensor coupled to the robot, the stream of data comprising a first portion and a second portion, the first portion comprising sensor data collected from an environment of the robot during a first session and the second portion comprising sensor data collected from the environment during a second session;
  
  receiving a first command from a user after the first portion of the stream of data, the first command corresponding to movement of the robot in the environment;
  
  generating a policy based on the first portion and the first command, the policy comprising an algorithm configured to receive the stream of data and output trajectory of the robot in the environment;
  
  determining receipt of a second command from the user after accessing the second portion of the stream data, the second command corresponding to a corrective command specifying a corrective action for the robot in the environment;
  
  parsing the stream of data to identify pertinent second sensor data, pertinent second sensor data corresponding to identified features of the second portion of the stream data;
  
  associating the second command with the pertinent second sensor data;
  
  modifying the policy based on the correction command; and
  
  causing the modified policy to be used to initiate performance of the robot based on relationship associated between the second command and the pertinent second sensor data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 18)
- - 2. The method of claim 1, wherein the policy is modified in real time based on the receipt of the corrective command.
  - 3. The method of claim 1, wherein, the first command at least partly defining a command that specifies an initial robot action for the robot to physically perform in the environment.
  - 4. The method of claim 1, wherein the policy is modified using a learning model.
  - 5. The method of claim 1, wherein the corrective action corresponds to a magnitude of action.
  - 6. The method of claim 1, wherein the robot comprises a motor for performing at least part of the corrective action.
  - 7. The method of claim 1, wherein the first and second commands are received via an interface of a user device remote to the robot.
  - 18. The method of claim 1, wherein,the pertinent second sensor data is identified based on at least one of a fixed time period before the second user command is identified, identification of object of interest, location of the robot in its environment, a clustering analysis, or optic flow.

8. A system for controlling a robot, comprising:
- one or more processing devices; and
  
  a non-transitory computer readable storage medium comprising computer readable instructions stored thereon which when executed by the one or more processing devices, causes the processing devices to,receive a stream of data from a sensor coupled to the robot, the stream of data comprising a first portion and a second portion, the first portion comprising sensor data collected from an environment of the robot during a first session and the second portion comprising sensor data collected from the environment during a second session,receive a first command from a user after the first portion of the stream of data, the first command corresponding to movement of the robot in the environment,generate a policy based on the first portion and the first command, the policy comprising an algorithm configured to receive the stream of data and output trajectory of the robot in the environment,parse the stream of data to identify pertinent second sensor data, pertinent second sensor data corresponding to identified features of the second portion of the stream data;
  
  determine receipt of a second command from the user after accessing the second portion of the stream data, the second command corresponding to a corrective command specifying a corrective action for the robot in the environment,associating the second command with the pertinent second sensor data,modifying the policy based on the correction command, andcausing the modified policy to be used to initiate performance of the robot based on relationship associated between the second command and the pertinent second sensor data.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 19)
- - 9. The system of claim 8, wherein the policy is modified in real time based on the receipt of the corrective command.
  - 10. The system of claim 8, wherein first command at least partly defining a command that specifies an initial robot action for the robot to physically perform in the environment.
  - 11. The system of claim 8, wherein the policy is modified using a learning model.
  - 12. The system of claim 8, wherein the corrective action corresponds to a magnitude of action.
  - 13. The system of claim 8, wherein the robot comprises a motor for performing at least part of the corrective action.
  - 14. The system of claim 8, wherein the first and second commands are received via an interface of a user device remote to the robot.
  - 19. The system of claim 8, wherein,the pertinent second sensor data is identified based on at least one of a fixed time period before the second user command is identified, identification of object of interest, location of the robot in its environment, a clustering analysis, or optic flow.

15. A non-transitory machine-readable storage medium comprising computer readable instructions stored thereon that when executed by a processing device configure the processing device to,receive a stream of data from a sensor coupled to the robot, the stream of data comprising a first portion and a second portion, the first portion comprising sensor data collected from an environment of the robot during a first session and the second portion comprising sensor data collected from the environment during a second session,receive a first command from a user after the first portion of the stream of data, the first command corresponding to movement of the robot in the environment,generate a policy based on the first portion and the first command, the policy comprising an algorithm configured to receive the stream of data and output trajectory of the robot in the environment,determine receipt of a second command from the user after accessing the second portion of the stream data, the second command corresponding to a corrective command specifying a corrective action for the robot in the environment;
- parse the stream of data to identify pertinent second sensor data, pertinent second sensor data corresponding to identified features of the second portion of the stream data;
  
  associating the second command with the pertinent second sensor data;
  
  modifying the policy based on the correction command; and
  
  causing the modified policy to be used to initiate performance of the robot based on relationship associated between the second command and the pertinent second sensor data.
- View Dependent Claims (16, 17, 20)
- - 16. The non-transitory machine-readable storage medium of claim 15, wherein the policy is modified in real time based on the receipt of the corrective command.
  - 17. The non-transitory machine-readable storage medium of claim 15, wherein,the first command at least partly defining a command that specifies an initial robot action for the robot to physically perform in the environment.
  - 20. The non-transitory computer readable storage medium of claim 15, wherein,the pertinent second sensor data is identified based on at least one of a fixed time period before the second user command is identified, identification of object of interest, location of the robot in its environment, a clustering analysis, or optic flow.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Brain Corporation
Original Assignee
Brain Corporation
Inventors
Meier, Philip, Passot, Jean-Baptiste, Gabardos, Borja Ibarz, Laurent, Patryk, Sinyavskiy, Oleg, O'Connor, Peter, Izhikevich, Eugene
Primary Examiner(s)
Sample, Jonathan L

Application Number

US16/402,758
Publication Number

US 20190321973A1
Time in Patent Office

571 Days
Field of Search

700245-264, 318567-569, 414783, 901 1, 901 2
US Class Current
CPC Class Codes

B25J 9/1602   characterised by the contro...

B25J 9/161   Hardware, e.g. neural netwo...

B25J 9/163   learning, adaptive, model b...

B25J 9/1656   characterised by programmin...

G05B 13/027   using neural networks only

G05B 2219/40116   Learn by operator observati...

G05D 1/0033   by having the operator trac...

G05D 1/0088   characterized by the autono...

G06N 20/00   Machine learning

G06N 3/008   based on physical entities ...

Y10S 901/01   Mobile robot

Y10S 901/46   Sensing device

Apparatus and methods for control of robot actions based on corrective user inputs

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

64 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus and methods for control of robot actions based on corrective user inputs

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

64 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links