3-DIMENSIONAL SCENE ANALYSIS FOR AUGMENTED REALITY OPERATIONS

US 20170243352A1
Filed: 02/18/2016
Published: 08/24/2017
Est. Priority Date: 02/18/2016
Status: Active Grant

First Claim

Patent Images

1. A processor-implemented method for 3-Dimensional (3D) scene analysis, the method comprising:

receiving, by a processor, a plurality of 3D image frames of a scene, each frame comprising a red-green-blue (RGB) image frame comprising color pixels and a depth map frame comprising depth pixels, wherein each of the 3D image frames is associated with a pose of a depth camera that generated the 3D image frames;

projecting, by the processor, the depth pixels into points in a global coordinate system based on the camera pose;

accumulating, by the processor, the projected points into a 3D reconstruction of the scene;

detecting, by the processor, objects and associated locations in the scene, for each 3D image frame, based on the camera pose, the 3D reconstruction, the RGB image frame and the depth map frame;

segmenting, by the processor, each of the detected objects in the scene, the segmented objects comprising the points of the 3D reconstruction corresponding to contours of the associated detected object; and

registering, by the processor, the segmented objects to a 3D model of the associated detected object to determine an alignment of the detected object in the scene.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques are provided for 3D analysis of a scene including detection, segmentation and registration of objects within the scene. The analysis results may be used to implement augmented reality operations including removal and insertion of objects and the generation of blueprints. An example method may include receiving 3D image frames of the scene, each frame associated with a pose of a depth camera, and creating a 3D reconstruction of the scene based on depth pixels that are projected and accumulated into a global coordinate system. The method may also include detecting objects, and associated locations within the scene, based on the 3D reconstruction, the camera pose and the image frames. The method may further include segmenting the detected objects into points of the 3D reconstruction corresponding to contours of the object and registering the segmented objects to 3D models of the objects to determine their alignment.

Citations

25 Claims

1. A processor-implemented method for 3-Dimensional (3D) scene analysis, the method comprising:
- receiving, by a processor, a plurality of 3D image frames of a scene, each frame comprising a red-green-blue (RGB) image frame comprising color pixels and a depth map frame comprising depth pixels, wherein each of the 3D image frames is associated with a pose of a depth camera that generated the 3D image frames;
  
  projecting, by the processor, the depth pixels into points in a global coordinate system based on the camera pose;
  
  accumulating, by the processor, the projected points into a 3D reconstruction of the scene;
  
  detecting, by the processor, objects and associated locations in the scene, for each 3D image frame, based on the camera pose, the 3D reconstruction, the RGB image frame and the depth map frame;
  
  segmenting, by the processor, each of the detected objects in the scene, the segmented objects comprising the points of the 3D reconstruction corresponding to contours of the associated detected object; and
  
  registering, by the processor, the segmented objects to a 3D model of the associated detected object to determine an alignment of the detected object in the scene.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, further comprising deleting a selected object from the scene by:
    - capturing a new RGB image frame that includes the selected object;
      
      generating a 2D mask based on the camera pose associated with the new RGB image frame and the registration corresponding to the selected object;
      
      replacing pixels associated with the selected object within the 2D mask, with values based on pixels associated with neighboring regions in the new RGB image frame; and
      
      applying the mask to the new RGB image frame.
  - 3. The method of claim 1, further comprising adding a selected object to the scene by:
    - capturing a new RGB image frame that includes a region where the selected object is to be added;
      
      generating a 2D RGB image of the selected object based on the camera pose associated with the new RGB image frame and a 3D model of the selected object; and
      
      rendering the 2D RGB image of the selected object onto the new RGB image frame.
  - 4. The method of claim 1, further comprising generating a blueprint of the scene based on the registered objects and the associated locations of the detected objects.
  - 5. The method of claim 1, wherein each pose of the depth camera is calculated by one of:
    - using a transformation of the camera based on an Iterative Closest Point (ICP) matching operation performed on the depth pixels of the depth map frame;
      
      or using a Simultaneous Localization and Mapping (SLAM) operation performed on the color pixels of the RGB image frame;
      
      or based on data provided by inertial sensors of the depth camera.
  - 6. The method of claim 1, wherein the object detection is based on at least one of template matching, classification using a bag-of-words vision model, and classification using a convolutional neural network.
  - 7. The method of claim 1, wherein the object segmentation is based on detecting and removing surface planes from the scene to generate a processed scene;
    - and performing a connected component clustering operation on the processed scene to generate the segmented objects.
  - 8. The method of claim 1, wherein the object segmentation further comprises:
    - associating a label with the detected object;
      
      calculating a 2-Dimensional (2D) bounding box containing the detected object, and a 3D location of the center of the 2D bounding box;
      
      matching the detected object to an existing object boundary set created from a previously received 3D image frame, the matching based on the label and the 3D location of the center of the 2D bounding box;
      
      in response to a failure of the matching, creating a new object boundary set associated with the detected object, wherein the object boundary set comprises 3D positions of pixels in the 2D bounding box corresponding to the boundary of the object, and further comprises vectors associated with the pixels, the vectors specifying a ray from the position of the depth camera associated with the corresponding pose, to each of the pixels; and
      
      adjusting the object boundary set to remove duplicate pixels generated from different poses of the depth camera, the removal based on the distance of the pixels from the camera and further based on the direction of the associated vectors.
  - 9. The method of claim 1, wherein the registration further comprises performing feature matching between the segmented objects and the associated 3D model;
    - and generating an alignment transformation based on the matched features using an Iterative Closest Point (ICP) matching operation and a Random Sample Consensus (RANSAC) operation.

10. A system for 3-Dimensional (3D) scene analysis, the system comprising:
- a 3D reconstruction circuit to receive a plurality of 3D image frames of a scene, each frame comprising a red-green-blue (RGB) image frame comprising color pixels and a depth map frame comprising depth pixels, wherein each of the 3D image frames is associated with a pose of a depth camera that generated the 3D image frames, the 3D reconstruction circuit further to project the depth pixels into points in a global coordinate system based on the camera pose and accumulate the projected points into a 3D reconstruction of the scene;
  
  an object detection circuit to detect objects and associated locations in the scene, for each 3D image frame, based on the camera pose, the 3D reconstruction, the RGB image frame and the depth map frame;
  
  a 3D segmentation circuit to segment each of the detected objects in the scene, the segmented objects comprising the points of the 3D reconstruction corresponding to contours of the associated detected object; and
  
  a 3D registration circuit to register the segmented objects to a 3D model of the associated detected object to determine an alignment of the detected object in the scene.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The system of claim 10, further comprising an augmented reality (AR) manipulation circuit to delete a selected object from the scene by:
    - capturing a new RGB image frame that includes the selected object;
      
      generating a 2D mask based on the camera pose associated with the new RGB image frame and the registration corresponding to the selected object;
      
      replacing pixels associated with the selected object within the 2D mask, with values based on pixels associated with neighboring regions in the new RGB image frame; and
      
      applying the mask to the new RGB image frame.
  - 12. The system of claim 10, further comprising an AR manipulation circuit to add a selected object to the scene by:
    - capturing a new RGB image frame that includes a region where the selected object is to be added;
      
      generating a 2D RGB image of the selected object based on the camera pose associated with the new RGB image frame and a 3D model of the selected object; and
      
      rendering the 2D RGB image of the selected object onto the new RGB image frame.
  - 13. The system of claim 10, further comprising an AR manipulation circuit to generate a blueprint of the scene based on the registered objects and the associated locations of the detected objects.
  - 14. The system of claim 10, wherein each pose of the depth camera is calculated by one of:
    - using a transformation of the camera based on an Iterative Closest Point (ICP) matching operation performed on the depth pixels of the depth map frame;
      
      or using a Simultaneous Localization and Mapping (SLAM) operation performed on the color pixels of the RGB image frame;
      
      or based on data provided by inertial sensors of the depth camera, and wherein the object detection is based on at least one of template matching, classification using a bag-of-words vision model, and classification using a convolutional neural network.
  - 15. The system of claim 10, wherein the object segmentation is based on detecting and removing surface planes from the scene to generate a processed scene;
    - and performing a connected component clustering operation on the processed scene to generate the segmented objects.
  - 16. The system of claim 10, wherein the object segmentation circuit is further to:
    - associate a label with the detected object;
      
      calculate a 2-Dimensional (2D) bounding box containing the detected object, and a 3D location of the center of the 2D bounding box;
      
      match the detected object to an existing object boundary set created from a previously received 3D image frame, the match based on the label and the 3D location of the center of the 2D bounding box;
      
      in response to a failure of the match, create a new object boundary set associated with the detected object, wherein the object boundary set comprises 3D positions of pixels in the 2D bounding box corresponding to the boundary of the object, and further comprises vectors associated with the pixels, the vectors specifying a ray from the position of the depth camera associated with the corresponding pose, to each of the pixels; and
      
      adjust the object boundary set to remove duplicate pixels generated from different poses of the depth camera, the removal based on the distance of the pixels from the camera and further based on the direction of the associated vectors.
  - 17. The system of claim 10, wherein the registration circuit is further to perform feature matching between the segmented objects and the associated 3D model;
    - and generate an alignment transformation based on the matched features using an Iterative Closest Point (ICP) matching operation and a Random Sample Consensus (RANSAC) operation.

18. At least one non-transitory computer readable storage medium having instructions encoded thereon that, when executed by one or more processors, result in the following operations for 3-Dimensional (3D) scene analysis, the operations comprising:
- receiving a plurality of 3D image frames of a scene, each frame comprising a red-green-blue (RGB) image frame comprising color pixels and a depth map frame comprising depth pixels, wherein each of the 3D image frames is associated with a pose of a depth camera that generated the 3D image frames;
  
  projecting the depth pixels into points in a global coordinate system based on the camera pose;
  
  accumulating the projected points into a 3D reconstruction of the scene;
  
  detecting objects and associated locations in the scene, for each 3D image frame, based on the camera pose, the 3D reconstruction, the RGB image frame and the depth map frame;
  
  segmenting each of the detected objects in the scene, the segmented objects comprising the points of the 3D reconstruction corresponding to contours of the associated detected object; and
  
  registering the segmented objects to a 3D model of the associated detected object to determine an alignment of the detected object in the scene.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25)
- - 19. The computer readable storage medium of claim 18, further comprising deleting a selected object from the scene by:
    - capturing a new RGB image frame that includes the selected object;
      
      generating a 2D mask based on the camera pose associated with the new RGB image frame and the registration corresponding to the selected object;
      
      replacing pixels associated with the selected object within the 2D mask, with values based on pixels associated with neighboring regions in the new RGB image frame; and
      
      applying the mask to the new RGB image frame.
  - 20. The computer readable storage medium of claim 18, further comprising adding a selected object to the scene by:
    - capturing a new RGB image frame that includes a region where the selected object is to be added;
      
      generating a 2D RGB image of the selected object based on the camera pose associated with the new RGB image frame and a 3D model of the selected object; and
      
      rendering the 2D RGB image of the selected object onto the new RGB image frame.
  - 21. The computer readable storage medium of claim 18, further comprising generating a blueprint of the scene based on the registered objects and the associated locations of the detected objects.
  - 22. The computer readable storage medium of claim 18, wherein each pose of the depth camera is calculated by one of:
    - using a transformation of the camera based on an Iterative Closest Point (ICP) matching operation performed on the depth pixels of the depth map frame;
      
      or using a Simultaneous Localization and Mapping (SLAM) operation performed on the color pixels of the RGB image frame;
      
      or based on data provided by inertial sensors of the depth camera, and wherein the object detection is based on at least one of template matching, classification using a bag-of-words vision model, and classification using a convolutional neural network.
  - 23. The computer readable storage medium of claim 18, wherein the object segmentation is based on detecting and removing surface planes from the scene to generate a processed scene;
    - and performing a connected component clustering operation on the processed scene to generate the segmented objects.
  - 24. The computer readable storage medium of claim 18, wherein the object segmentation further comprises:
    - associating a label with the detected object;
      
      calculating a 2-Dimensional (2D) bounding box containing the detected object, and a 3D location of the center of the 2D bounding box;
      
      matching the detected object to an existing object boundary set created from a previously received 3D image frame, the matching based on the label and the 3D location of the center of the 2D bounding box;
      
      in response to a failure of the matching, creating a new object boundary set associated with the detected object, wherein the object boundary set comprises 3D positions of pixels in the 2D bounding box corresponding to the boundary of the object, and further comprises vectors associated with the pixels, the vectors specifying a ray from the position of the depth camera associated with the corresponding pose, to each of the pixels; and
      
      adjusting the object boundary set to remove duplicate pixels generated from different poses of the depth camera, the removal based on the distance of the pixels from the camera and further based on the direction of the associated vectors.
  - 25. The computer readable storage medium of claim 18, wherein the registration further comprises performing feature matching between the segmented objects and the associated 3D model;
    - and generating an alignment transformation based on the matched features using an Iterative Closest Point (ICP) matching operation and a Random Sample Consensus (RANSAC) operation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors
Kutliroff, Gershom, Yanai, Yaron, Fleishman, Shahar, Kliger, Mark

Granted Patent

US 10,373,380 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06T 19/006   Mixed reality object pose d...

G06T 2207/10021   Stereoscopic video; Stereos...

G06T 2207/10024   Color image

G06T 2207/10028   Range image; Depth image; 3...

G06T 2207/30244   Camera pose

G06T 7/10   Segmentation; Edge detectio...

3-DIMENSIONAL SCENE ANALYSIS FOR AUGMENTED REALITY OPERATIONS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

3-DIMENSIONAL SCENE ANALYSIS FOR AUGMENTED REALITY OPERATIONS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links