Learning to reconstruct 3D shapes by rendering many 3D views
First Claim
1. A computer-implemented method comprising:
- obtaining, from an object recognition engine, data specifying first image features derived from an image of an object;
providing the first image features to a three-dimensional estimator neural network;
obtaining, from the three-dimensional estimator neural network, data specifying (i) an estimated three-dimensional shape and (ii) an estimated texture that are each based on the first image features;
providing the data specifying (i) the estimated three-dimensional shape and (ii) the estimated texture to a three-dimensional rendering engine;
obtaining, from the three-dimensional rendering engine, data specifying a plurality of three-dimensional views of the object that are each generated based on the data specifying (i) the estimated three-dimensional shape and (ii) the estimated texture;
providing the data specifying each of the plurality of three-dimensional views to the object recognition engine;
obtaining, from the object recognition engine and for each of the plurality of three-dimensional views specified by the data, data specifying second image features derived from the data specifying the three-dimensional view;
computing, based at least on the data specifying the first image features and the data specifying the second image features, a first loss based on a first loss function; and
training the three-dimensional estimator neural network based at least on the computed first loss.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus for obtaining first image features derived from an image of an object, providing the first image features to a three-dimensional estimator neural network, and obtaining, from the three-dimensional estimator neural network, data specifying an estimated three-dimensional shape and texture based on the first image features. The estimated three-dimensional shape and texture are provided to a three-dimensional rendering engine, and a plurality of three-dimensional views of the object are generated by the three-dimensional rendering engine based on the estimated three-dimensional shape and texture. The plurality of three-dimensional views are provided to the object recognition engine, and second image features derived from the plurality of three-dimensional views are obtained from the object recognition engine. A loss is computed based at least on the first and second image features, and the three-dimensional estimator neural network is trained based at least on the computed loss.
13 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
obtaining, from an object recognition engine, data specifying first image features derived from an image of an object; providing the first image features to a three-dimensional estimator neural network; obtaining, from the three-dimensional estimator neural network, data specifying (i) an estimated three-dimensional shape and (ii) an estimated texture that are each based on the first image features; providing the data specifying (i) the estimated three-dimensional shape and (ii) the estimated texture to a three-dimensional rendering engine; obtaining, from the three-dimensional rendering engine, data specifying a plurality of three-dimensional views of the object that are each generated based on the data specifying (i) the estimated three-dimensional shape and (ii) the estimated texture; providing the data specifying each of the plurality of three-dimensional views to the object recognition engine; obtaining, from the object recognition engine and for each of the plurality of three-dimensional views specified by the data, data specifying second image features derived from the data specifying the three-dimensional view; computing, based at least on the data specifying the first image features and the data specifying the second image features, a first loss based on a first loss function; and training the three-dimensional estimator neural network based at least on the computed first loss. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system comprising:
-
a processor configured to execute computer program instructions; and a computer storage medium encoded with computer programs that, when executed by the processor, cause the system to perform operations comprising; obtaining, from an object recognition engine, data specifying first image features derived from an image of an object; providing the first image features to a three-dimensional estimator neural network; obtaining, from the three-dimensional estimator neural network, data specifying (i) an estimated three-dimensional shape and (ii) an estimated texture that are each based on the first image features; providing the data specifying (i) the estimated three-dimensional shape and (ii) the estimated texture to a three-dimensional rendering engine; obtaining, from the three-dimensional rendering engine, data specifying a plurality of three-dimensional views of the object that are each generated based on the data specifying (i) the estimated three-dimensional shape and (ii) the estimated texture; providing the data specifying each of the plurality of three-dimensional views to the object recognition engine; obtaining, from the object recognition engine and for each of the plurality of three-dimensional views specified by the data, data specifying second image features derived from the data specifying the three-dimensional view; computing, based at least on the data specifying the first image features and the data specifying the second image features, a first loss based on a first loss function; and training the three-dimensional estimator neural network based at least on the computed first loss. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A computer-readable device storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
obtaining, from an object recognition engine, data specifying first image features derived from an image of an object; providing the first image features to a three-dimensional estimator neural network; obtaining, from the three-dimensional estimator neural network, data specifying (i) an estimated three-dimensional shape and (ii) an estimated texture that are each based on the first image features; providing the data specifying (i) the estimated three-dimensional shape and (ii) the estimated texture to a three-dimensional rendering engine; obtaining, from the three-dimensional rendering engine, data specifying a plurality of three-dimensional views of the object that are each generated based on the data specifying (i) the estimated three-dimensional shape and (ii) the estimated texture; providing the data specifying each of the plurality of three-dimensional views to the object recognition engine; obtaining, from the object recognition engine and for each of the plurality of three-dimensional views specified by the data, data specifying second image features derived from the data specifying the three-dimensional view; computing, based at least on the data specifying the first image features and the data specifying the second image features, a first loss based on a first loss function; and training the three-dimensional estimator neural network based at least on the computed first loss. - View Dependent Claims (17, 18, 19, 20)
-
Specification