Deep machine learning methods and apparatus for robotic grasping

US 9,914,213 B2
Filed: 03/02/2017
Issued: 03/13/2018
Est. Priority Date: 03/03/2016
Status: Active Grant

First Claim

Patent Images

1. A method implemented by one or more processors, comprising:

generating a candidate end effector motion vector defining motion to move a grasping end effector of a robot from a current pose to an additional pose;

identifying a current image captured by a vision sensor associated with the robot, the current image capturing the grasping end effector and at least one object in an environment of the robot;

applying the current image and the candidate end effector motion vector as input to a trained grasp convolutional neural network;

generating, over the trained grasp convolutional neural network, a measure of successful grasp of the object with application of the motion, the measure being generated based on the application of the image and the end effector motion vector to the trained grasp convolutional neural network;

identifying a desired object semantic feature;

applying, as input to a semantic convolutional neural network, a spatial transformation of the current image or of an additional image captured by the vision sensor;

generating, over the semantic convolutional neural network based on the spatial transformation, an additional measure that indicates whether the desired object semantic feature is present in the spatial transformation;

generating an end effector command based on the measure of successful grasp and the additional measure that indicates whether the desired object semantic feature is present; and

providing the end effector command to one or more actuators of the robot.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Deep machine learning methods and apparatus related to manipulation of an object by an end effector of a robot. Some implementations relate to training a semantic grasping model to predict a measure that indicates whether motion data for an end effector of a robot will result in a successful grasp of an object; and to predict an additional measure that indicates whether the object has desired semantic feature(s). Some implementations are directed to utilization of the trained semantic grasping model to servo a grasping end effector of a robot to achieve a successful grasp of an object having desired semantic feature(s).

43 Citations

View as Search Results

18 Claims

1. A method implemented by one or more processors, comprising:
- generating a candidate end effector motion vector defining motion to move a grasping end effector of a robot from a current pose to an additional pose;
  
  identifying a current image captured by a vision sensor associated with the robot, the current image capturing the grasping end effector and at least one object in an environment of the robot;
  
  applying the current image and the candidate end effector motion vector as input to a trained grasp convolutional neural network;
  
  generating, over the trained grasp convolutional neural network, a measure of successful grasp of the object with application of the motion, the measure being generated based on the application of the image and the end effector motion vector to the trained grasp convolutional neural network;
  
  identifying a desired object semantic feature;
  
  applying, as input to a semantic convolutional neural network, a spatial transformation of the current image or of an additional image captured by the vision sensor;
  
  generating, over the semantic convolutional neural network based on the spatial transformation, an additional measure that indicates whether the desired object semantic feature is present in the spatial transformation;
  
  generating an end effector command based on the measure of successful grasp and the additional measure that indicates whether the desired object semantic feature is present; and
  
  providing the end effector command to one or more actuators of the robot.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, further comprising:
    - generating, over the trained grasp convolutional neural network based on the application of the image and the end effector motion vector to the trained grasp convolutional neural network, spatial transformation parameters; and
      
      generating the spatial transformation over a spatial transformation network based on the spatial transformation parameters.
  - 3. The method of claim 1, wherein the desired object semantic feature defines an object classification.
  - 4. The method of claim 1, further comprising:
    - receiving user interface input from a user interface input device;
      
      wherein identifying the desired object semantic feature is based on the user interface input.
  - 5. The method of claim 4, wherein the user interface input device is a microphone of the robot.
  - 6. The method of claim 1, wherein the spatial transformation is of the current image.
  - 7. The method of claim 6, wherein the spatial transformation crops out a portion of the current image.
  - 8. The method of claim 1, further comprising:
    - determining a current measure of successful grasp of the object without application of the motion;
      
      wherein generating the end effector command based on the measure comprises generating the end effector command based on comparison of the measure to the current measure.
  - 9. The method of claim 8, wherein the end effector command is a grasp command and wherein generating the grasp command is in response to:
    - determining that the additional measure indicates that the desired object feature is present in the spatial transformation; and
      
      determining that comparison of the measure to the current measure satisfies one or more criteria.
  - 10. The method of claim 1, wherein the end effector command is an end effector motion command and wherein generating the end effector motion command comprises generating the end effector motion command to conform to the candidate end effector motion vector.
  - 11. The method of claim 1, wherein the end effector command is an end effector motion command and wherein generating the end effector motion command comprises generating the end effector motion command to effectuate a trajectory correction to the end effector.
  - 12. The method of claim 1, wherein the end effector command is an end effector motion command and conforms to the candidate end effector motion vector, wherein providing the end effector motion command to the one or more actuators moves the end effector to a new pose, and further comprising:
    - generating, by one or more processors, an additional candidate end effector motion vector defining new motion to move the grasping end effector from the new pose to a further additional pose;
      
      identifying, by one or more of the processors, a new image captured by a vision sensor associated with the robot, the new image capturing the end effector at the new pose and capturing the objects in the environment;
      
      applying, by one or more of the processors, the new image and the additional candidate end effector motion vector as input to the trained grasp convolutional neural network;
      
      generating, over the trained grasp convolutional neural network, a new measure of successful grasp of the object with application of the new motion, the new measure being generated based on the application of the new image and the additional end effector motion vector to the trained grasp convolutional neural network;
      
      applying, as input to the semantic convolutional neural network, an additional spatial transformation of the new image or a new additional image captured by the vision sensor;
      
      generating, over the semantic convolutional neural network based on the additional spatial transformation, a new additional measure that indicates whether the desired object feature is present in the spatial transformation;
      
      generating a new end effector command based on the new measure of successful grasp and the new additional measure that indicates whether the desired object feature is present; and
      
      providing the new end effector command to one or more actuators of the robot.
  - 13. The method of claim 1, wherein applying the image and the candidate end effector motion vector as input to the trained grasp convolutional neural network comprises:
    - applying the image as input to an initial layer of the trained grasp convolutional neural network; and
      
      applying the candidate end effector motion vector to an additional layer of the trained grasp convolutional neural network, the additional layer being downstream of the initial layer.
  - 14. The method of claim 1, wherein generating the candidate end effector motion vector comprises:
    - generating a plurality of candidate end effector motion vectors; and
      
      performing one or more iterations of cross-entropy optimization on the plurality of candidate end effector motion vectors to select the candidate end effector motion vector from the plurality of candidate end effector motion vectors.

15. A method implemented by one or more processors, comprising:
- identifying a current image captured by a vision sensor associated with a robot;
  
  generating, over a grasp convolutional neural network based on application of the current image to the grasp convolutional neural network;
  
  a measure of successful grasp, by a grasping end effector of the robot, of an object captured in the current image, andspatial transformation parameters;
  
  generating, over a spatial transformer network, a spatial transformation based on the spatial transformation parameters, the spatial transformation being of the current image or an additional image captured by the vision sensor;
  
  applying the spatial transformation as input to a semantic convolutional neural network;
  
  generating, over the semantic convolutional neural network based on the spatial transformation, an additional measure that indicates whether a desired object semantic feature is present in the spatial transformation;
  
  generating an end effector command based on the measure and the additional measure; and
  
  providing the end effector command to one or more actuators of the robot.
- View Dependent Claims (16, 17, 18)
- - 16. The method of claim 15, wherein generating the measure of successful grasp and the spatial transformation parameters over the grasp convolutional neural network is further based on application of a candidate end effector motion vector to the grasp convolutional neural network.
  - 17. The method of claim 16, wherein generating the measure of successful grasp and the spatial transformation parameters over the grasp convolutional neural network is based on application of:
    - the image as input to an initial layer of the trained grasp convolutional neural network; and
      
      the candidate end effector motion vector to an additional layer of the trained grasp convolutional neural network, the additional layer being downstream of the initial layer.
  - 18. The method of claim 15, further comprising:
    - determining a current measure of successful grasp of the object;
      
      wherein generating the end effector command based on the measure comprises generating the end effector command based on comparison of the measure to the current measure.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Vijayanarasimhan, Sudheendra, Jang, Eric, Pastor Sampedro, Peter, Levine, Sergey
Primary Examiner(s)
Oh, Harry Y

Application Number

US15/448,013
Publication Number

US 20170252924A1
Time in Patent Office

376 Days
Field of Search

700250
US Class Current
CPC Class Codes

B25J 9/1612   characterised by the hand, ...

B25J 9/163   learning, adaptive, model b...

B25J 9/1697   Vision controlled systems

G05B 13/027   using neural networks only

G05B 19/18   Numerical control [NC], i.e...

G06N 3/008   based on physical entities ...

G06N 3/045   Combinations of networks

G06N 3/08   Learning methods

G06N 3/084   Backpropagation, e.g. using...

Y10S 901/36   Actuating means

Deep machine learning methods and apparatus for robotic grasping

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

43 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Deep machine learning methods and apparatus for robotic grasping

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

43 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links