Three dimensional scene reconstruction based on contextual analysis
First Claim
1. A processor-implemented method for 3-Dimensional (3D) scene reconstruction, the method comprising:
- receiving, by a processor, a plurality of 3D image frames of a scene, each frame comprising a red-green-blue (RGB) image frame comprising color pixels and a depth map frame, the depth map frame comprising depth pixels, wherein each of the 3D image frames is associated with a pose of a depth camera that generated the 3D image frames;
projecting, by the processor, the depth pixels into points in a global coordinate system based on the camera pose;
accumulating, by the processor, the projected points into a 3D reconstruction of the scene;
detecting, by the processor, objects for each 3D image frame, based on the camera pose, the 3D reconstruction, the RGB image frame and the depth map frame;
classifying, by the processor, the detected objects into one or more object classes;
grouping, by the processor, a plurality of instances of objects in one of the object classes based on a measure of similarity of features between the object instances, wherein the grouping comprisesdetecting features associated with surfaces of each of first and second object instances,applying feature descriptors to the detected features,matching descriptors between the first and second object instances, andpairing the first object instance with the second object instance, the first and second object instances associated with a greatest number of descriptor matches;
validating the paired object instances by calculating a distance between the matched descriptors of the paired object instances and comparing the distance to a threshold value; and
combining, by the processor, point clouds associated with each of the first and second object instances to generate a fused object.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques are provided for context-based 3D scene reconstruction employing fusion of multiple instances of an object within the scene. A methodology implementing the techniques according to an embodiment includes receiving 3D image frames of the scene, each frame associated with a pose of a depth camera, and creating a 3D reconstruction of the scene based on depth pixels that are projected and accumulated into a global coordinate system. The method may also include detecting objects, based on the 3D reconstruction, the camera pose and the image frames. The method may further include classifying the detected objects into one or more object classes; grouping two or more instances of objects in one of the object classes based on a measure of similarity of features between the object instances; and combining point clouds associated with each of the grouped object instances to generate a fused object.
-
Citations
20 Claims
-
1. A processor-implemented method for 3-Dimensional (3D) scene reconstruction, the method comprising:
-
receiving, by a processor, a plurality of 3D image frames of a scene, each frame comprising a red-green-blue (RGB) image frame comprising color pixels and a depth map frame, the depth map frame comprising depth pixels, wherein each of the 3D image frames is associated with a pose of a depth camera that generated the 3D image frames; projecting, by the processor, the depth pixels into points in a global coordinate system based on the camera pose; accumulating, by the processor, the projected points into a 3D reconstruction of the scene; detecting, by the processor, objects for each 3D image frame, based on the camera pose, the 3D reconstruction, the RGB image frame and the depth map frame; classifying, by the processor, the detected objects into one or more object classes; grouping, by the processor, a plurality of instances of objects in one of the object classes based on a measure of similarity of features between the object instances, wherein the grouping comprises detecting features associated with surfaces of each of first and second object instances, applying feature descriptors to the detected features, matching descriptors between the first and second object instances, and pairing the first object instance with the second object instance, the first and second object instances associated with a greatest number of descriptor matches; validating the paired object instances by calculating a distance between the matched descriptors of the paired object instances and comparing the distance to a threshold value; and combining, by the processor, point clouds associated with each of the first and second object instances to generate a fused object. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An electronic system for 3-Dimensional (3D) scene reconstruction, the system comprising:
-
a 3D reconstruction circuit to receive a plurality of 3D image frames of a scene, each frame comprising a red-green-blue (RGB) image frame comprising color pixels and a depth map frame, the depth map frame comprising depth pixels, wherein each of the 3D image frames is associated with a pose of a depth camera that generated the 3D image frames, the 3D reconstruction circuit further to project the depth pixels into points in a global coordinate system based on the camera pose and accumulate the projected points into a 3D reconstruction of the scene; an object detection circuit to detect objects for each 3D image frame, based on the camera pose, the 3D reconstruction, the RGB image frame and the depth map frame; an object recognition circuit to classify the detected objects into one or more object classes; a feature detection circuit to detect features associated with surfaces of each of first and second object instances; a feature descriptor application circuit to apply feature descriptors to the detected features; a descriptor matching circuit to match descriptors between the first and second object instances; an instance pairing circuit to pair the first object instance with the second object instance, the first and second object instances associated with the greatest number of descriptor matches; an instance registration circuit to register the paired object instances by computing a rigid transformation to map shared regions between the paired object instances; a registration confirmation circuit to validate the registration of the paired object instances by calculating a distance between the matched descriptors of the paired object instances and comparing the distance to a threshold value; and a context based fusion circuit to group a plurality of instances of objects in one of the object classes based on a measure of similarity of features between the object instances and to combine point clouds associated with each of the first and second object instances to generate a fused object. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. At least one non-transitory computer readable storage medium having instructions encoded thereon that, when executed by one or more processors, result in the following operations for 3-Dimensional (3D) scene reconstruction, the operations comprising:
-
receiving a plurality of 3D image frames of a scene, each frame comprising a red-green-blue (RGB) image frame comprising color pixels and a depth map frame, the depth map frame comprising depth pixels, wherein each of the 3D image frames is associated with a pose of a depth camera that generated the 3D image frames; projecting the depth pixels into points in a global coordinate system based on the camera pose; accumulating the projected points into a 3D reconstruction of the scene; detecting objects for each 3D image frame, based on the camera pose, the 3D reconstruction, the RGB image frame and the depth map frame; classifying the detected objects into one or more object classes; grouping a plurality of instances of objects in one of the object classes based on a measure of similarity of features between the object instances, wherein the grouping comprises detecting features associated with surfaces of each of first and second object instances, applying feature descriptors to the detected features, matching descriptors between the first and second object instances, and pairing the first object instance with the second object instance, the first and second object instances associated with a greatest number of descriptor matches; validating the paired object instances by calculating a distance between the matched descriptors of the paired object instances and comparing the distance to a threshold value; and combining point clouds associated with each of the first and second object instances to generate a fused object. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification