Recognition and reconstruction of objects with partial appearance
First Claim
Patent Images
1. A system operable to recognize and reconstruct objects in images, the system comprising:
- a non-transitory memory storage comprising instructions; and
one or more processors in communication with the non-transitory memory storage, wherein the one or more processors execute the instructions to perform operations comprising;
acquiring a plurality of images, each image including an object within the image, the object having a different appearance in a number of images of the plurality of images;
decomposing, for each image, the object into components and generating label data for each image;
inputting the images and associated label data into a learning module to train the learning module to recognize the components of the object, the training being based on an overall objectness score of the object, an objectness score of each component of the object, a pose of the object, and a pose of each component of the object for each image input;
inputting an additional image into the trained learning module;
detecting that the additional image has one or more of the components of the object within the additional image in response to input into the trained learning module; and
identifying the object and estimating its pose information in the additional image and/or constructing a complete view of the object in the additional image or a complete sketch of the object in response to detection of the one or more of the components, wherein training the learning module includes minimizing a cost function generated as a sum of discrepancies between the label data and computed data for a set of items, the set of items including the overall objectness score, the objectness scores of the components, the pose of the object, and the poses of the components.
1 Assignment
0 Petitions
Accused Products
Abstract
Various embodiments include systems and methods structured to provide recognition of an object in an image using a learning module trained using decomposition of the object into components in a number of training images. The training can be based on an overall objectness score of the object, an objectness score of each component of the object, a pose of the object, and a pose of each component of the object for each training image input. Additional systems and methods can be implemented in a variety of applications.
-
Citations
18 Claims
-
1. A system operable to recognize and reconstruct objects in images, the system comprising:
-
a non-transitory memory storage comprising instructions; and one or more processors in communication with the non-transitory memory storage, wherein the one or more processors execute the instructions to perform operations comprising; acquiring a plurality of images, each image including an object within the image, the object having a different appearance in a number of images of the plurality of images; decomposing, for each image, the object into components and generating label data for each image; inputting the images and associated label data into a learning module to train the learning module to recognize the components of the object, the training being based on an overall objectness score of the object, an objectness score of each component of the object, a pose of the object, and a pose of each component of the object for each image input; inputting an additional image into the trained learning module; detecting that the additional image has one or more of the components of the object within the additional image in response to input into the trained learning module; and identifying the object and estimating its pose information in the additional image and/or constructing a complete view of the object in the additional image or a complete sketch of the object in response to detection of the one or more of the components, wherein training the learning module includes minimizing a cost function generated as a sum of discrepancies between the label data and computed data for a set of items, the set of items including the overall objectness score, the objectness scores of the components, the pose of the object, and the poses of the components. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system operable to recognize and reconstruct objects in images, the system comprising:
-
a non-transitory memory storage comprising instructions; and one or more processors in communication with the non-transitory memory storage, wherein the one or more processors execute the instructions to perform operations comprising; acquiring a plurality of images, each image including an object within the image, the object having a different appearance in a number of images of the plurality of images; decomposing, for each image, the object into components and generating label data for each image; inputting the images and associated label data into a learning module to train the learning module to recognize the components of the object, the training being based on an overall objectness score of the object, an objectness score of each component of the object, a pose of the object, and a pose of each component of the object for each image input; inputting an additional image into the trained learning module; detecting that the additional image has one or more of the components of the object within the additional image in response to input into the trained learning module; and identifying the object and estimating its pose information in the additional image and/or constructing a complete view of the object in the additional image or a complete sketch of the object in response to detection of the one or more of the components, wherein the learning module is a deep neural network and training the deep neural network includes inputting the images and associated label data to a first convolution layer of a plurality of convolution layers i, i=1, . . . n−
1, n arranged in series such that input to each convolution layer includes an output of a previous convolution layer in the series beginning with convolution layer 2, convolution layer 1 being the first convolution layer;generating, from the output of convolution layer n−
1, region proposals for each component;generating, from the output of convolution layer n, region proposals for the object in whole; and performing region of interest pooling using the region proposals for each component, output of convolution layers n−
1 and n, and the region proposals for the object in whole. - View Dependent Claims (9)
-
-
10. A computer-implemented method comprising:
-
acquiring a plurality of images, each image including an object within the image, the object having a different appearance in a number of images of the plurality of images; decomposing, for each image, the object into components and generating label data for each image; inputting the images and associated label data into a learning module to train the learning module to recognize the components of the object, the training being based on an overall objectness score of the object, an objectness score of each component of the object, a pose of the object, and a pose of each component of the object for each image input; inputting an additional image into the trained learning module; detecting that the additional image has one or more of the components of the object within the additional image in response to input into the trained learning module; and identifying the object and estimating its pose information in the additional image and/or constructing a complete view of the object in the additional image or a complete sketch of the object in response to detection of the one or more of the components, wherein training the learning module includes minimizing a cost function generated as a sum of discrepancies between the label data and computed data for a set of items, the set of items including the overall objectness score, the objectness scores of the components, the pose of the object, and the poses of the components. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification