Deep machine learning methods and apparatus for robotic grasping

US 10,207,402 B2
Filed: 12/13/2016
Issued: 02/19/2019
Est. Priority Date: 03/03/2016
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

generating, by one or more processors, a candidate end effector motion vector defining motion to move a grasping end effector of a robot from a current pose to an additional pose;

identifying, by one or more of the processors, a current image captured by a vision sensor associated with the robot, the current image capturing the grasping end effector and at least one object in an environment of the robot;

applying, by one or more of the processors, the current image and the candidate end effector motion vector as input to a trained convolutional neural network;

generating, over the trained convolutional neural network, a measure of successful grasp of the object with application of the motion, the measure being generated based on the application of the current image and the end effector motion vector to the trained convolutional neural network;

generating an end effector command based on the measure, the end effector command being a grasp command or an end effector motion command; and

providing the end effector command to one or more actuators of the robot.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Deep machine learning methods and apparatus related to manipulation of an object by an end effector of a robot. Some implementations relate to training a deep neural network to predict a measure that candidate motion data for an end effector of a robot will result in a successful grasp of one or more objects by the end effector. Some implementations are directed to utilization of the trained deep neural network to servo a grasping end effector of a robot to achieve a successful grasp of an object by the grasping end effector. For example, the trained deep neural network may be utilized in the iterative updating of motion control commands for one or more actuators of a robot that control the pose of a grasping end effector of the robot, and to determine when to generate grasping control commands to effectuate an attempted grasp by the grasping end effector.

Citations

21 Claims

1. A method, comprising:
- generating, by one or more processors, a candidate end effector motion vector defining motion to move a grasping end effector of a robot from a current pose to an additional pose;
  
  identifying, by one or more of the processors, a current image captured by a vision sensor associated with the robot, the current image capturing the grasping end effector and at least one object in an environment of the robot;
  
  applying, by one or more of the processors, the current image and the candidate end effector motion vector as input to a trained convolutional neural network;
  
  generating, over the trained convolutional neural network, a measure of successful grasp of the object with application of the motion, the measure being generated based on the application of the current image and the end effector motion vector to the trained convolutional neural network;
  
  generating an end effector command based on the measure, the end effector command being a grasp command or an end effector motion command; and
  
  providing the end effector command to one or more actuators of the robot.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, further comprising:
    - determining, by one or more of the processors, a current measure of successful grasp of the object without application of the motion;
      
      wherein generating the end effector command is based on the measure and the current measure.
  - 3. The method of claim 2, wherein the end effector command is the grasp command and wherein generating the grasp command is in response to determining that comparison of the measure to the current measure satisfies a threshold.
  - 4. The method of claim 2, wherein the end effector command is the end effector motion command and wherein generating the end effector motion command comprises generating the end effector motion command to conform to the candidate end effector motion vector.
  - 5. The method of claim 2, wherein the end effector command is the end effector motion command and wherein generating the end effector motion command comprises generating the end effector motion command to effectuate a trajectory correction to the end effector.
  - 6. The method of claim 2, wherein determining the current measure of successful grasp of the object without application of the motion comprises:
    - applying, by one or more of the processors, the current image and a null end effector motion vector as input to the trained convolutional neural network; and
      
      generating, over the trained convolutional neural network, the current measure of successful grasp of the object without application of the motion, the current measure being generated based on the application of the current image and the null end effector motion vector to the trained convolutional neural network.
  - 7. The method of claim 1, wherein the end effector command is the end effector motion command and conforms to the candidate end effector motion vector, wherein providing the end effector motion command to the one or more actuators moves the end effector to a new pose, and further comprising:
    - generating, by one or more processors, an additional candidate end effector motion vector defining new motion to move the grasping end effector from the new pose to a further additional pose;
      
      identifying, by one or more of the processors, a new image captured by a vision sensor associated with the robot, the new image capturing the end effector at the new pose and capturing the objects in the environment;
      
      applying, by one or more of the processors, the new image and the additional candidate end effector motion vector as input to the trained convolutional neural network;
      
      generating, over the trained convolutional neural network, a new measure of successful grasp of the object with application of the new motion, the new measure being generated based on the application of the new image and the additional end effector motion vector to the trained convolutional neural network;
      
      generating a new end effector command based on the new measure, the new end effector command being the grasp command or a new end effector motion command; and
      
      providing the new end effector command to one or more actuators of the robot.
  - 8. The method of claim 1, wherein applying the current image and the candidate end effector motion vector as input to the trained convolutional neural network comprises:
    - applying the current image as input to an initial layer of the trained convolutional neural network; and
      
      applying the candidate end effector motion vector to an additional layer of the trained convolutional neural network, the additional layer being downstream of the initial layer.
  - 9. The method of claim 8, wherein applying the candidate end effector motion vector to the additional layer comprises:
    - passing the end effector motion vector through a fully connected layer of the convolutional neural network to generate end effector motion vector output; and
      
      concatenating the end effector motion vector output with upstream output, the upstream output being from an immediately upstream layer of the convolutional neural network that is immediately upstream of the additional layer and that is downstream from the initial layer and from one or more intermediary layers of the convolutional neural network.
  - 10. The method of claim 9, wherein the initial layer is a convolutional layer and the immediately upstream layer is a pooling layer.
  - 11. The method of claim 1, further comprising:
    - identifying, by one or more of the processors, an additional image captured by the vision sensor, the additional image capturing the one or more environmental objects and omitting the robotic end effector or including the robotic end effector in a different pose than that of the robotic end effector in the current image; and
      
      applying the additional image as additional input to the trained convolutional neural network.
  - 12. The method of claim 11, wherein applying the current image and the additional image to the convolutional neural network comprises:
    - concatenating the current image and the additional image to generate a concatenated image; and
      
      applying the concatenated image as input to an initial layer of the convolutional neural network.
  - 13. The method of claim 1, wherein generating the candidate end effector motion vector comprises:
    - generating a plurality of candidate end effector motion vectors; and
      
      performing one or more iterations of cross-entropy optimization on the plurality of candidate end effector motion vectors to select the candidate end effector motion vector from the plurality of candidate end effector motion vectors.

14. A system, comprising:
- a vision sensor viewing an environment;
  
  a trained convolutional neural network stored in one or more non-transitory computer readable media;
  
  at least one processor configured to;
  
  generate a candidate end effector motion vector defining motion to move a robotic end effector from a current pose to an additional pose;
  
  apply the candidate end motion vector and an image captured by the vision sensor as input to the trained convolutional neural network, the image capturing an end effector and at least one object in an environment of the object;
  
  generate, over the trained convolutional neural network, a measure of successful grasp of the object with application of the motion, the measure being generated based on the application of the image and the end effector motion vector to the trained convolutional neural network;
  
  generate an end effector command based on the measure, the end effector command being a grasp command or an end effector motion command; and
  
  provide the end effector command to one or more actuators of the robot.

15. A method of training a convolutional neural network, comprising:
- identifying, by one or more processors, a plurality of training examples generated based on sensor output from one or more robots during a plurality of grasp attempts by the robots,each of the training examples including training example input comprising;
  
  an image for a corresponding instance of time of a corresponding grasp attempt of the grasp attempts, the image capturing a robotic end effector and one or more environmental objects at the corresponding instance of time, andan end effector motion vector defining motion of the end effector to move from an instance of time pose of the end effector at the corresponding instance of time to a final pose of the end effector for the corresponding grasp attempt,each of the training examples including training example output comprising;
  
  a grasp success label indicative of success of the corresponding grasp attempt;
  
  training, by one or more of the processors, the convolutional neural network based on the training examples.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The method of claim 15, wherein the training example input of each of the training examples further comprises:
    - an additional image for the corresponding grasp attempt, the additional image capturing the one or more environmental objects and omitting the robotic end effector or including the robotic end effector in a different pose than that of the robotic end effector in the image.
  - 17. The method of claim 16, wherein training the convolutional neural network comprises applying, to the convolutional neural network, the training example input of a given training example of the training examples, wherein applying the training example input of the given training example comprises:
    - concatenating the image and the additional image of the given training example to generate a concatenated image; and
      
      applying the concatenated image as input to an initial layer of the convolutional neural network.
  - 18. The method of claim 15, wherein training the convolutional neural network comprises applying, to the convolutional neural network, the training example input of a given training example of the training examples, wherein applying the training example input of the given training example comprises:
    - applying the image of the given training example as input to an initial layer of the convolutional neural network; and
      
      applying the end effector motion vector of the given training example to an additional layer of the convolutional neural network, the additional layer being downstream of the initial layer.
  - 19. The method of claim 18, wherein applying the end effector motion vector to the additional layer comprises:
    - passing the end effector motion vector through a fully connected layer to generate end effector motion vector output and concatenating the end effector motion vector output with upstream output, the upstream output being from an immediately upstream layer of the convolutional neural network that is immediately upstream of the additional layer and that is downstream from the initial layer and from one or more intermediary layers of the convolutional neural network.
  - 20. The method of claim 15, wherein the training examples comprise:
    - a first group of the training examples generated based on output from a plurality of first robot sensors of a first robot during a plurality of the grasp attempts by the first robot; and
      
      a second group of the training examples generated based on output from a plurality of second robot sensors of a second robot during a plurality of the grasp attempts by the second robot.
  - 21. The method of claim 15, further comprising:
    - generating additional grasp attempts based on the trained convolutional neural network;
      
      identifying a plurality of additional training examples based on the additional grasp attempts; and
      
      updating the convolutional neural network by further training of the convolutional network based on the additional training examples.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Levine, Sergey, Sampedro, Peter Pastor, Krizhevsky, Alex
Primary Examiner(s)
Mancho, Ronnie M

Application Number

US15/377,280
Publication Number

US 20170252922A1
Time in Patent Office

798 Days
Field of Search

700245, 700259
US Class Current
CPC Class Codes

B25J 9/161   Hardware, e.g. neural netwo...

B25J 9/1612   characterised by the hand, ...

B25J 9/163   learning, adaptive, model b...

B25J 9/1664   characterised by motion, pa...

B25J 9/1669   characterised by special ap...

B25J 9/1697   Vision controlled systems

G05B 13/027   using neural networks only

G05B 2219/39509   Gripping, grasping, links e...

G06N 3/045   Combinations of networks

G06N 3/08   Learning methods

G06N 3/084   Backpropagation, e.g. using...

Deep machine learning methods and apparatus for robotic grasping

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Deep machine learning methods and apparatus for robotic grasping

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links