Learning to reconstruct 3D shapes by rendering many 3D views

US 10,403,031 B2
Filed: 11/15/2017
Issued: 09/03/2019
Est. Priority Date: 11/15/2017
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

obtaining, from an object recognition engine, data specifying first image features derived from an image of an object;

providing the first image features to a three-dimensional estimator neural network;

obtaining, from the three-dimensional estimator neural network, data specifying (i) an estimated three-dimensional shape and (ii) an estimated texture that are each based on the first image features;

providing the data specifying (i) the estimated three-dimensional shape and (ii) the estimated texture to a three-dimensional rendering engine;

obtaining, from the three-dimensional rendering engine, data specifying a plurality of three-dimensional views of the object that are each generated based on the data specifying (i) the estimated three-dimensional shape and (ii) the estimated texture;

providing the data specifying each of the plurality of three-dimensional views to the object recognition engine;

obtaining, from the object recognition engine and for each of the plurality of three-dimensional views specified by the data, data specifying second image features derived from the data specifying the three-dimensional view;

computing, based at least on the data specifying the first image features and the data specifying the second image features, a first loss based on a first loss function; and

training the three-dimensional estimator neural network based at least on the computed first loss.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus for obtaining first image features derived from an image of an object, providing the first image features to a three-dimensional estimator neural network, and obtaining, from the three-dimensional estimator neural network, data specifying an estimated three-dimensional shape and texture based on the first image features. The estimated three-dimensional shape and texture are provided to a three-dimensional rendering engine, and a plurality of three-dimensional views of the object are generated by the three-dimensional rendering engine based on the estimated three-dimensional shape and texture. The plurality of three-dimensional views are provided to the object recognition engine, and second image features derived from the plurality of three-dimensional views are obtained from the object recognition engine. A loss is computed based at least on the first and second image features, and the three-dimensional estimator neural network is trained based at least on the computed loss.

13 Citations

View as Search Results

20 Claims

1. A computer-implemented method comprising:
- obtaining, from an object recognition engine, data specifying first image features derived from an image of an object;
  
  providing the first image features to a three-dimensional estimator neural network;
  
  obtaining, from the three-dimensional estimator neural network, data specifying (i) an estimated three-dimensional shape and (ii) an estimated texture that are each based on the first image features;
  
  providing the data specifying (i) the estimated three-dimensional shape and (ii) the estimated texture to a three-dimensional rendering engine;
  
  obtaining, from the three-dimensional rendering engine, data specifying a plurality of three-dimensional views of the object that are each generated based on the data specifying (i) the estimated three-dimensional shape and (ii) the estimated texture;
  
  providing the data specifying each of the plurality of three-dimensional views to the object recognition engine;
  
  obtaining, from the object recognition engine and for each of the plurality of three-dimensional views specified by the data, data specifying second image features derived from the data specifying the three-dimensional view;
  
  computing, based at least on the data specifying the first image features and the data specifying the second image features, a first loss based on a first loss function; and
  
  training the three-dimensional estimator neural network based at least on the computed first loss.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The computer-implemented method of claim 1, wherein the object is a face of a person.
  - 3. The computer-implemented method of claim 2, wherein the image of the face of the person is a synthetic image that is generated based on data specifying (i) a synthetic three-dimensional shape and (ii) a synthetic texture.
  - 4. The computer-implemented method of claim 3, comprising:
    - computing, based at least on (i) the data specifying the synthetic three-dimensional shape and the synthetic texture and (ii) the data specifying the estimated three-dimensional shape and the estimated texture, a second loss based on a second loss function; and
      
      training the three-dimensional estimator neural network based at least on the computed first loss and the computed second loss.
  - 5. The computer-implemented method of claim 3, wherein generating the synthetic image of the face of the person based on the data specifying (i) the synthetic three-dimensional shape and (ii) the synthetic texture comprises:
    - generating a rendering that is based on (i) a particular pose and a particular lighting of the synthetic three-dimensional shape and (ii) the synthetic texture.
  - 6. The computer-implemented method of claim 3, wherein the computed second loss indicates a substantiality of differences between (i) vertices that are determined based on the synthetic three-dimensional shape and synthetic texture and (ii) vertices that are determined based on the estimated three-dimensional shape and the estimated texture.
  - 7. The computer-implemented method of claim 1, wherein the computed first loss indicates a substantiality of differences between the first image features and the second image features.
  - 8. The computer-implemented method of claim 1, wherein the object recognition engine is a facial recognition neural network.
  - 9. The computer-implemented method of claim 1, wherein the three-dimensional rendering engine is one of a three-dimensional rendering neural network or a three-dimensional rasterization engine.
  - 10. The computer-implemented method of claim 1, wherein each of the plurality of three-dimensional views is generated based on a respective pose and a respective lighting that is distinct from a pose and a lighting of each of the other three-dimensional views.

11. A system comprising:
- a processor configured to execute computer program instructions; and
  
  a computer storage medium encoded with computer programs that, when executed by the processor, cause the system to perform operations comprising;
  
  obtaining, from an object recognition engine, data specifying first image features derived from an image of an object;
  
  providing the first image features to a three-dimensional estimator neural network;
  
  obtaining, from the three-dimensional estimator neural network, data specifying (i) an estimated three-dimensional shape and (ii) an estimated texture that are each based on the first image features;
  
  providing the data specifying (i) the estimated three-dimensional shape and (ii) the estimated texture to a three-dimensional rendering engine;
  
  obtaining, from the three-dimensional rendering engine, data specifying a plurality of three-dimensional views of the object that are each generated based on the data specifying (i) the estimated three-dimensional shape and (ii) the estimated texture;
  
  providing the data specifying each of the plurality of three-dimensional views to the object recognition engine;
  
  obtaining, from the object recognition engine and for each of the plurality of three-dimensional views specified by the data, data specifying second image features derived from the data specifying the three-dimensional view;
  
  computing, based at least on the data specifying the first image features and the data specifying the second image features, a first loss based on a first loss function; and
  
  training the three-dimensional estimator neural network based at least on the computed first loss.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The system of claim 11, wherein the object is a face of a person.
  - 13. The system of claim 12, wherein the image of the face of the person is a synthetic image that is generated based on data specifying (i) a synthetic three-dimensional shape and (ii) a synthetic texture.
  - 14. The system of claim 13, wherein the operations comprise:
    - computing, based at least on (i) the data specifying the synthetic three-dimensional shape and the synthetic texture and (ii) the data specifying the estimated three-dimensional shape and the estimated texture, a second loss based on a second loss function; and
      
      training the three-dimensional estimator neural network based at least on the computed first loss and the computed second loss.
  - 15. The system of claim 13, wherein generating the synthetic image of the face of the person based on the data specifying (i) the synthetic three-dimensional shape and (ii) the synthetic texture comprises:
    - generating a rendering that is based on (i) a particular pose and a particular lighting of the synthetic three-dimensional shape and (ii) the synthetic texture.

16. A computer-readable device storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- obtaining, from an object recognition engine, data specifying first image features derived from an image of an object;
  
  providing the first image features to a three-dimensional estimator neural network;
  
  obtaining, from the three-dimensional estimator neural network, data specifying (i) an estimated three-dimensional shape and (ii) an estimated texture that are each based on the first image features;
  
  providing the data specifying (i) the estimated three-dimensional shape and (ii) the estimated texture to a three-dimensional rendering engine;
  
  obtaining, from the three-dimensional rendering engine, data specifying a plurality of three-dimensional views of the object that are each generated based on the data specifying (i) the estimated three-dimensional shape and (ii) the estimated texture;
  
  providing the data specifying each of the plurality of three-dimensional views to the object recognition engine;
  
  obtaining, from the object recognition engine and for each of the plurality of three-dimensional views specified by the data, data specifying second image features derived from the data specifying the three-dimensional view;
  
  computing, based at least on the data specifying the first image features and the data specifying the second image features, a first loss based on a first loss function; and
  
  training the three-dimensional estimator neural network based at least on the computed first loss.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The computer-readable device of claim 16, wherein the object is a face of a person.
  - 18. The computer-readable device of claim 17, wherein the image of the face of the person is a synthetic image that is generated based on data specifying (i) a synthetic three-dimensional shape and (ii) a synthetic texture.
  - 19. The computer-readable device of claim 18, wherein the operations comprise:
    - computing, based at least on (i) the data specifying the synthetic three-dimensional shape and the synthetic texture and (ii) the data specifying the estimated three-dimensional shape and the estimated texture, a second loss based on a second loss function; and
      
      training the three-dimensional estimator neural network based at least on the computed first loss and the computed second loss.
  - 20. The computer-readable device of claim 18, wherein generating the synthetic image of the face of the person based on the data specifying (i) the synthetic three-dimensional shape and (ii) the synthetic texture comprises:
    - generating a rendering that is based on (i) a particular pose and a particular lighting of the synthetic three-dimensional shape and (ii) the synthetic texture.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Cole, Forrester H., Genova, Kyle
Primary Examiner(s)
Sherali, Ishrat I

Application Number

US15/813,338
Publication Number

US 20190147642A1
Time in Patent Office

657 Days
Field of Search

382154, 382155, 382159
US Class Current
CPC Class Codes

G06F 18/24133   Distances to prototypes

G06T 15/205   Image-based rendering

G06T 17/00   Three dimensional [3D] mode...

G06V 10/454   Integrating the filters int...

G06V 10/764   using classification, e.g. ...

G06V 10/82   using neural networks

G06V 20/64   Three-dimensional objects

G06V 20/653   by matching three-dimension...

G06V 40/168   Feature extraction; Face re...

G06V 40/171   Local features and componen...

Learning to reconstruct 3D shapes by rendering many 3D views

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

13 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Learning to reconstruct 3D shapes by rendering many 3D views

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

13 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links