Compact Language-Free Facial Expression Embedding and Novel Triplet Training Scheme

US 20190005313A1
Filed: 06/30/2017
Published: 01/03/2019
Est. Priority Date: 06/30/2017
Status: Active Grant

First Claim

Patent Images

1. A computer system, the computer system comprising:

a facial expression model configured to receive an input image that depicts a face and, in response, provide a facial expression embedding that encodes information descriptive of a facial expression made by the face depicted in the input image;

one or more processors; and

one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors, cause the computer system to;

obtain the input image that depicts the face;

input the input image into the facial expression model; and

receive the facial expression embedding that encodes information descriptive of the facial expression made by the face from the facial expression model.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present disclosure provides systems and methods that include or otherwise leverage use of a facial expression model that is configured to provide a facial expression embedding. In particular, the facial expression model can receive an input image that depicts a face and, in response, provide a facial expression embedding that encodes information descriptive of a facial expression made by the face depicted in the input image. As an example, the facial expression model can be or include a neural network such as a convolutional neural network. The present disclosure also provides a novel and unique triplet training scheme which does not rely upon designation of a particular image as an anchor or reference image.

Citations

20 Claims

1. A computer system, the computer system comprising:
- a facial expression model configured to receive an input image that depicts a face and, in response, provide a facial expression embedding that encodes information descriptive of a facial expression made by the face depicted in the input image;
  
  one or more processors; and
  
  one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors, cause the computer system to;
  
  obtain the input image that depicts the face;
  
  input the input image into the facial expression model; and
  
  receive the facial expression embedding that encodes information descriptive of the facial expression made by the face from the facial expression model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The computer system of claim 1, wherein the facial expression model has been trained on a training dataset that comprises a plurality of images organized into triplets of images.
  - 3. The computer system of claim 2, wherein each triplet of images comprises a label that indicates which two of three images included in such triplet of images have been assessed to be a most similar pair of images within such triplet of images.
  - 4. The computer system of claim 3, wherein, for each triplet of images, none of the three images is indicated to be an anchor image.
  - 5. The computer system of claim 3, wherein:
    - each triplet of images comprises a first image, a second image, and a third image, wherein the first image and the second image have been assessed to be the most similar pair of images within the triplet of images;
      
      the facial expression model has been trained with an objective function that encodes both a first constraint and a second constraint;
      
      the first constraint comprises a first requirement that a first distance between a first embedding provided for the first image by the facial expression model and a second embedding provided for the second image by the facial expression model is less than a second distance between the first embedding and a third embedding provided for the third image by the facial expression model; and
      
      the second constraint comprises a second requirement that the first distance between the first embedding and the second embedding is less than a third distance between the second embedding provided for the second image by the facial expression model and the third embedding provided for the third image by the facial expression model.
  - 6. The computer system of claim 1, wherein the facial expression model comprises a convolutional neural network.
  - 7. The computer system of claim 1, wherein the facial expression embedding comprises a language-free facial expression embedding.
  - 8. The computer system of claim 1, wherein execution of the instructions further causes the computer system to:
    - identify, as a search result in response to a search query associated with the input image based at least in part on the facial expression embedding, at least one additional image that depicts a same facial expression as the facial expression made by the face depicted by the input image, wherein to identify the at least one additional image as the search result the computer system compares the facial expression embedding associated with the input image to a plurality of additional facial expression embeddings respectively associated with a plurality of candidate images that are potential search results.
  - 9. The computer system of claim 1, wherein execution of the instructions further causes the computer system to:
    - identify, based at least in part on the facial expression embedding, an emoji that has an emoji expression that is most similar to the facial expression made by the face.
  - 10. The computer system of claim 1, wherein execution of the instructions further causes the computer system to:
    - respectively input a plurality of images included in an image collection into the facial expression model to obtain a plurality of facial expression embeddings respectively for the plurality of images; and
      
      sort the plurality of images included in the image collection into two or more clusters based at least in part on the plurality of facial expression embeddings.
  - 11. The computer system of claim 1, wherein execution of the instructions further causes the computer system to:
    - determine a score for the input image based at least in part on the facial expression embedding, wherein the score for the input image is indicative of a desirability of the input image; and
      
      perform at least one of the following;
      
      select the input image as a best shot from a group of images based at least in part on the score determined for the input image relative to other scores determined for the group of images; and
      
      determine, based at least in part on the score determined for the input image, whether to store a non-temporary copy of the input image or to discard a temporary copy of the input image without storing the non-temporary copy of the input image.
  - 12. The computer system of claim 1, wherein execution of the instructions further causes the computer system to:
    - control a puppet face based at least in part on the facial expression embedding provided for the input image.
  - 13. The computer system of claim 1, further comprising:
    - a generative neural network configured to receive the facial expression embedding and, in response, generate a synthesized image of a second face that has a same facial expression as the facial expression made by the face depicted by the input image;
      
      wherein execution of the instructions further causes the computer system to;
      
      input the facial expression embedding into the generative neural network; and
      
      receive the synthesized image of the second face that has the same facial expression as an output of the generative neural network.

14. A computer-implemented method for training a facial expression model, the method comprising:
- receiving training data comprising plural triplets of images that depict faces and similarity information corresponding to each triplet, wherein the similarity information for each triplet indicates which two images of such triplet have been determined to be the most similar pair of images in terms of facial expressions, wherein none of the images is indicated to be a reference image against which the other two images of the triplet have been compared; and
  
  training the facial expression model using the training data such that a distance in an embedding space between the most similar pair of images is less than respective distances between each image of the pair and the third image of the triplet.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The computer-implemented method of claim 14, wherein training the facial expression model using the training data such that the distance in the embedding space between the most similar pair of images is less than respective distances between each image of the pair and the third image of the triplet comprises training the facial expression model using the training data such that the distance in the embedding space between the most similar pair of images is less than respective distances between each image of the pair and the third image of the triplet by at least a margin.
  - 16. The computer-implemented method of claim 14, wherein none of the images is labeled so as to indicate that the image is the reference image.
  - 17. The computer-implemented method of claim 14, wherein the similarity information comprises a label for the triplet that identifies the two images of the triplet that have been determined to be the most similar pair of images.
  - 18. The computer-implemented method of claim 14, wherein the similarity information comprises a label applied to each of the most similar pair of images.
  - 19. The computer-implemented method of claim 14, wherein training the facial expression model comprises training the facial expression model using an objective function that constrains the distance in an embedding space between the most similar pair of images to be less than the respective distances between each image of the pair and the third image of the triplet.

20. One or more non-transitory computer-readable media that collectively store instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:
- training a facial expression model to provide language-free facial expression embeddings descriptive of facial expressions made by faces included in input images, wherein training the facial expression model comprises;
  
  obtaining a training dataset that comprises a plurality of images organized into triplets of images, wherein each triplet of images comprises a label that indicates that a first image and a second image included in such triplet of images have been assessed to be a most similar pair of images within such triplet of images; and
  
  training the facial expression model using an objective function that encodes both a first constraint and a second constraint;
  
  wherein, for each triplet of images, the first constraint comprises a first requirement that a first distance between a first embedding provided for the first image by the facial expression model and a second embedding provided for the second image by the facial expression model is less than a second distance between the first embedding and a third embedding provided by the facial expression model for a third image included in such triplet of images; and
  
  wherein, for each triplet of images, the second constraint comprises a second requirement that the first distance between the first embedding and the second embedding is less than a third distance between the second embedding provided for the second image by the facial expression model and the third embedding provided for the third image by the facial expression model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Vemulapalli, Raviteja, Agarwala, Aseem

Granted Patent

US 10,565,434 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 18/24323   Tree-organised classifiers

G06N 20/00   Machine learning

G06N 3/04   Architecture, e.g. intercon...

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/045   Combinations of networks

G06N 3/08   Learning methods

G06V 10/454   Integrating the filters int...

G06V 40/173   face re-identification, e.g...

G06V 40/174   Facial expression recognition

Compact Language-Free Facial Expression Embedding and Novel Triplet Training Scheme

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Compact Language-Free Facial Expression Embedding and Novel Triplet Training Scheme

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links