Recognition-based object segmentation of a 3-dimensional image

US 10,482,681 B2
Filed: 02/09/2016
Issued: 11/19/2019
Est. Priority Date: 02/09/2016
Status: Active Grant

First Claim

Patent Images

1. A processor-implemented method for 3-Dimensional (3D) segmentation of objects, the method comprising:

receiving, by a processor, a plurality of 3D image frames of a scene, wherein each of the 3D image frames is associated with a pose of a depth camera that generated the 3D image frames;

detecting, by the processor, an object in each of the 3D image frames based on object recognition;

associating, by the processor, a label with the detected object, the label generated from the object recognition;

calculating, by the processor, a 2-Dimensional (2D) bounding box containing the detected object, and a 3D location of the center of the 2D bounding box;

matching, by the processor, the detected object to an existing object boundary set created from a previously received 3D image frame, the matching based on the label and the 3D location of the center of the 2D bounding box; and

in response to a failure of the matching, creating, by the processor, a new object boundary set associated with the detected object.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques are provided for segmentation of objects in a 3D image of a scene. An example method may include receiving, 3D image frames of a scene. Each of the frames is associated with a pose of a depth camera that generated the 3D image frames. The method may also include detecting the objects in each of the frames based on object recognition; associating a label with the detected object; calculating a 2D bounding box around the object; and calculating a 3D location of the center of the bounding box. The method may further include matching the detected object to an existing object boundary set, created from a previously received image frame, based on the label and the location of the center of the bounding box, or, if the match fails, creating a new object boundary set associated with the detected object.

Citations

25 Claims

1. A processor-implemented method for 3-Dimensional (3D) segmentation of objects, the method comprising:
- receiving, by a processor, a plurality of 3D image frames of a scene, wherein each of the 3D image frames is associated with a pose of a depth camera that generated the 3D image frames;
  
  detecting, by the processor, an object in each of the 3D image frames based on object recognition;
  
  associating, by the processor, a label with the detected object, the label generated from the object recognition;
  
  calculating, by the processor, a 2-Dimensional (2D) bounding box containing the detected object, and a 3D location of the center of the 2D bounding box;
  
  matching, by the processor, the detected object to an existing object boundary set created from a previously received 3D image frame, the matching based on the label and the 3D location of the center of the 2D bounding box; and
  
  in response to a failure of the matching, creating, by the processor, a new object boundary set associated with the detected object.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the object boundary set comprises 3D positions of pixels in the 2D bounding box corresponding to the boundary of the object, and further comprises vectors associated with the pixels, the vectors specifying a ray from the position of the depth camera associated with the corresponding pose, to each of the pixels.
  - 3. The method of claim 2, further comprising adjusting the object boundary set to remove duplicate pixels generated from different poses of the depth camera, the removal based on the distance of the pixels from the camera and further based on the direction of the associated vectors.
  - 4. The method of claim 3, further comprising adjusting the object boundary set to remove pixels associated with an occluding object.
  - 5. The method of claim 1, further comprising:
    - detecting surface planes in the scene;
      
      calculating an intersection of the detected surface plane with the object boundary set;
      
      calculating a ratio of pixels in the intersection to pixels in the detected surface plane; and
      
      if the ratio is less than a threshold value, removing the pixels in the detected surface plane from the object boundary set.
  - 6. The method of claim 5, wherein the detecting of surface planes further comprises:
    - calculating normal vectors as the cross product of the difference of neighboring depth pixels of the 3D image frame;
      
      clustering the normal vectors based on their value and spatial proximity; and
      
      fitting a plane to each cluster based on a least-squares fit.
  - 7. The method of claim 1, wherein each pose of the depth camera is estimated by one of:
    - using a transformation of the camera based on an Iterative Closest Point (ICP) matching operation performed on depth pixels of the 3D image frames;
      
      or using a Simultaneous Localization and Mapping (SLAM) operation performed on Red-Green-Blue (RGB) pixels of the 3D image frames;
      
      or based on data provided by inertial sensors in the depth camera.
  - 8. The method of claim 1, wherein the object recognition is based on at least one of template matching, classification using a bag-of-words vision model, and classification using a convolutional neural network.

9. A system for 3-Dimensional (3D) segmentation of objects, the system comprising:
- an object detection circuit to;
  
  detect an object in each of a plurality of 3D image frames of a scene based on object recognition, wherein the plurality of 3D image frames are captured by a depth camera, each of the 3D image frames being associated with a pose of the depth camera;
  
  associate a label with the detected object, the label generated by the object detection circuit based on the object recognition; and
  
  calculate a 2-Dimensional (2D) bounding box containing the detected object and a 3D location of the center of the 2D bounding box;
  
  an object boundary set matching circuit to match the detected object to an existing object boundary set created from a previously received 3D image frame, the matching based on the label and the 3D location of the center of the 2D bounding box; and
  
  an object boundary set creation circuit to create, in response to a failure of the matching, a new object boundary set associated with the detected object.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 10. The system of claim 9, wherein the object boundary set comprises 3D positions of pixels in the 2D bounding box corresponding to the boundary of the object, and further comprises vectors associated with the pixels, the vectors specifying a ray from the position of the depth camera associated with the corresponding pose, to each of the pixels.
  - 11. The system of claim 10, further comprising a boundary adjustment circuit to adjust the object boundary set to remove duplicate pixels generated from different poses of the depth camera, the removal based on the distance of the pixels from the camera and further based on the direction of the associated vectors.
  - 12. The system of claim 11, wherein the boundary adjustment circuit is further to adjust the object boundary set to remove pixels associated with an occluding object.
  - 13. The system of claim 9, further comprising a surface plane removal circuit to:
    - detect surface planes in the scene;
      
      calculate an intersection of the detected surface plane with the object boundary set;
      
      calculate a ratio of pixels in the intersection to pixels in the detected surface plane; and
      
      if the ratio is less than a threshold value, remove the pixels in the detected surface plane from the object boundary set.
  - 14. The system of claim 13, wherein the surface plane removal circuit is further to:
    - calculate normal vectors as the cross product of the difference of neighboring depth pixels of the 3D image frame;
      
      cluster the normal vectors based on their value and spatial proximity; and
      
      fit a plane to each cluster, based on a least-squares fit, to detect the surface planes in the scene.
  - 15. The system of claim 9, wherein each pose of the depth camera is estimated by one of:
    - using a transformation of the camera based on an Iterative Closest Point (ICP) matching operation performed on depth pixels of the 3D image frames;
      
      or using a Simultaneous Localization and Mapping (SLAM) operation performed on Red-Green-Blue (RGB) pixels of the 3D image frames;
      
      or based on data provided by inertial sensors in the depth camera.
  - 16. The system of claim 9, wherein the object recognition is based on at least one of template matching, classification using a bag-of-words vision model, and classification using a convolutional neural network.
  - 17. The system of claim 9, further comprising the depth camera.
  - 18. A system-on-chip or chip set comprising the system of claim 9.

19. At least one non-transitory computer readable storage medium having instructions encoded thereon that, when executed by one or more processors, result in the following operations for 3-Dimensional (3D) segmentation of objects, the operations comprising:
- receiving, a plurality of 3D image frames of a scene, wherein each of the 3D image frames is associated with a pose of a depth camera that generated the 3D image frames;
  
  detecting an object in each of the 3D image frames based on object recognition;
  
  associating a label with the detected object, the label derived from the object recognition;
  
  calculating a 2-Dimensional (2D) bounding box containing the detected object, and a 3D location of the center of the 2D bounding box;
  
  matching the detected object to an existing object boundary set created from a previously received 3D image frame, the matching based on the label and the 3D location of the center of the 2D bounding box; and
  
  creating in response to a failure of the matching, a new object boundary set associated with the detected object.
- View Dependent Claims (20, 21, 22, 25)
- - 20. The computer readable storage medium of claim 19, wherein the object boundary set comprises 3D positions of pixels in the 2D bounding box corresponding to the boundary of the object, and further comprises vectors associated with the pixels, the vectors specifying a ray from the position of the depth camera associated with the corresponding pose, to each of the pixels.
  - 21. The computer readable storage medium of claim 20, further comprising adjusting the object boundary set to remove duplicate pixels generated from different poses of the depth camera, the removal based on the distance of the pixels from the camera and further based on the direction of the associated vectors.
  - 22. The computer readable storage medium of claim 21, further comprising adjusting the object boundary set to remove pixels associated with an occluding object.
  - 25. The computer readable storage medium of claim 19, wherein each pose of the depth camera is estimated by one of:
    - using a transformation of the camera based on an Iterative Closest Point (ICP) matching operation performed on depth pixels of the 3D image frames;
      
      or using a Simultaneous Localization and Mapping (SLAM) operation performed on Red-Green-Blue (RGB) pixels of the 3D image frames;
      
      or based on data provided by inertial sensors in the depth camera, and wherein the object recognition is based on at least one of template matching, classification using a bag-of-words vision model, and classification using a convolutional neural network.

23. The computer readable storage medium of 19, further comprising:
- detecting surface planes in the scene;
  
  calculating an intersection of the detected surface plane with the object boundary set;
  
  calculating a ratio of pixels in the intersection to pixels in the detected surface plane; and
  
  if the ratio is less than a threshold value, removing the pixels in the detected surface plane from the object boundary set.
- View Dependent Claims (24)
- - 24. The computer readable storage medium of claim 23, wherein the detecting of surface planes further comprises:
    - calculating normal vectors as the cross product of the difference of neighboring depth pixels of the 3D image frame;
      
      clustering the normal vectors based on their value and spatial proximity; and
      
      fitting a plane to each cluster based on a least-squares fit.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors
Kutliroff, Gershom
Primary Examiner(s)
Couso, Yon J

Application Number

US15/019,011
Publication Number

US 20170228940A1
Time in Patent Office

1,379 Days
Field of Search
US Class Current
CPC Class Codes

G06T 19/20   Editing of 3D images, e.g. ...

G06T 2207/10028   Range image; Depth image; 3...

G06T 2219/2008   Assembling, disassembling

G06T 7/12   Edge-based segmentation

G06V 20/00   Scenes; Scene-specific elem...

Recognition-based object segmentation of a 3-dimensional image

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Recognition-based object segmentation of a 3-dimensional image

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links