Compact language-free facial expression embedding and novel triplet training scheme

US 10,565,434 B2
Filed: 06/30/2017
Issued: 02/18/2020
Est. Priority Date: 06/30/2017
Status: Active Grant

First Claim

Patent Images

1. A computer system, the computer system comprising:

a facial expression model configured to receive an input image that depicts a face and, in response, provide a facial expression embedding that encodes information descriptive of a facial expression made by the face depicted in the input image;

one or more processors; and

one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors, cause the computer system to;

obtain the input image that depicts the face;

input the input image into the facial expression model; and

receive the facial expression embedding that encodes information descriptive of the facial expression made by the face from the facial expression model;

wherein the facial expression model has been trained on a training dataset that comprises a plurality of images organized into triplets of images;

wherein each triplet of images comprises a label that indicates which two of three images included in such triplet of images have been assessed to be a most similar pair of images within such triplet of images;

wherein each triplet of images comprises a first image, a second image, and a third image, wherein the first image and the second image have been assessed to be the most similar pair of images within the triplet of images;

wherein the facial expression model has been trained with an objective function that encodes both a first constraint and a second constraint;

wherein the first constraint comprises a first requirement that a first distance between a first embedding provided for the first image by the facial expression model and a second embedding provided for the second image by the facial expression model is less than a second distance between the first embedding and a third embedding provided for the third image by the facial expression model; and

wherein the second constraint comprises a second requirement that the first distance between the first embedding and the second embedding is less than a third distance between the second embedding provided for the second image by the facial expression model and the third embedding provided for the third image by the facial expression model.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present disclosure provides systems and methods that include or otherwise leverage use of a facial expression model that is configured to provide a facial expression embedding. In particular, the facial expression model can receive an input image that depicts a face and, in response, provide a facial expression embedding that encodes information descriptive of a facial expression made by the face depicted in the input image. As an example, the facial expression model can be or include a neural network such as a convolutional neural network. The present disclosure also provides a novel and unique triplet training scheme which does not rely upon designation of a particular image as an anchor or reference image.

Citations

14 Claims

1. A computer system, the computer system comprising:
- a facial expression model configured to receive an input image that depicts a face and, in response, provide a facial expression embedding that encodes information descriptive of a facial expression made by the face depicted in the input image;
  
  one or more processors; and
  
  one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors, cause the computer system to;
  
  obtain the input image that depicts the face;
  
  input the input image into the facial expression model; and
  
  receive the facial expression embedding that encodes information descriptive of the facial expression made by the face from the facial expression model;
  
  wherein the facial expression model has been trained on a training dataset that comprises a plurality of images organized into triplets of images;
  
  wherein each triplet of images comprises a label that indicates which two of three images included in such triplet of images have been assessed to be a most similar pair of images within such triplet of images;
  
  wherein each triplet of images comprises a first image, a second image, and a third image, wherein the first image and the second image have been assessed to be the most similar pair of images within the triplet of images;
  
  wherein the facial expression model has been trained with an objective function that encodes both a first constraint and a second constraint;
  
  wherein the first constraint comprises a first requirement that a first distance between a first embedding provided for the first image by the facial expression model and a second embedding provided for the second image by the facial expression model is less than a second distance between the first embedding and a third embedding provided for the third image by the facial expression model; and
  
  wherein the second constraint comprises a second requirement that the first distance between the first embedding and the second embedding is less than a third distance between the second embedding provided for the second image by the facial expression model and the third embedding provided for the third image by the facial expression model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The computer system of claim 1, wherein, for each triplet of images, none of the three images is indicated to be an anchor image.
  - 3. The computer system of claim 1, wherein the facial expression model comprises a convolutional neural network.
  - 4. The computer system of claim 1, wherein the facial expression embedding comprises a language-free facial expression embedding.
  - 5. The computer system of claim 1, wherein execution of the instructions further causes the computer system to:
    - identify, as a search result in response to a search query associated with the input image based at least in part on the facial expression embedding, at least one additional image that depicts a same facial expression as the facial expression made by the face depicted by the input image, wherein to identify the at least one additional image as the search result the computer system compares the facial expression embedding associated with the input image to a plurality of additional facial expression embeddings respectively associated with a plurality of candidate images that are potential search results.
  - 6. The computer system of claim 1, wherein execution of the instructions further causes the computer system to:
    - identify, based at least in part on the facial expression embedding, an emoji that has an emoji expression that is most similar to the facial expression made by the face.
  - 7. The computer system of claim 1, wherein execution of the instructions further causes the computer system to:
    - respectively input a plurality of images included in an image collection into the facial expression model to obtain a plurality of facial expression embeddings respectively for the plurality of images; and
      
      sort the plurality of images included in the image collection into two or more clusters based at least in part on the plurality of facial expression embeddings.
  - 8. The computer system of claim 1, wherein execution of the instructions further causes the computer system to:
    - determine a score for the input image based at least in part on the facial expression embedding, wherein the score for the input image is indicative of a desirability of the input image; and
      
      perform at least one of the following;
      
      select the input image as a best shot from a group of images based at least in part on the score determined for the input image relative to other scores determined for the group of images; and
      
      determine, based at least in part on the score determined for the input image, whether to store a non-temporary copy of the input image or to discard a temporary copy of the input image without storing the non-temporary copy of the input image.
  - 9. The computer system of claim 1, wherein execution of the instructions further causes the computer system to:
    - control a puppet face based at least in part on the facial expression embedding provided for the input image.
  - 10. The computer system of claim 1, further comprising:
    - a generative neural network configured to receive the facial expression embedding and, in response, generate a synthesized image of a second face that has a same facial expression as the facial expression made by the face depicted by the input image;
      
      wherein execution of the instructions further causes the computer system to;
      
      input the facial expression embedding into the generative neural network; and
      
      receive the synthesized image of the second face that has the same facial expression as an output of the generative neural network.

11. One or more non-transitory computer-readable media that collectively store instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:
- training a facial expression model to provide language-free facial expression embeddings descriptive of facial expressions made by faces included in input images, wherein training the facial expression model comprises;
  
  obtaining a training dataset that comprises a plurality of images organized into triplets of images, wherein each triplet of images comprises a label that indicates that a first image and a second image included in such triplet of images have been assessed to be a most similar pair of images within such triplet of images; and
  
  training the facial expression model using an objective function that encodes both a first constraint and a second constraint;
  
  wherein, for each triplet of images, the first constraint comprises a first requirement that a first distance between a first embedding provided for the first image by the facial expression model and a second embedding provided for the second image by the facial expression model is less than a second distance between the first embedding and a third embedding provided by the facial expression model for a third image included in such triplet of images; and
  
  wherein, for each triplet of images, the second constraint comprises a second requirement that the first distance between the first embedding and the second embedding is less than a third distance between the second embedding provided for the second image by the facial expression model and the third embedding provided for the third image by the facial expression model.

12. A computer system, the computer system comprising:
- a facial expression model configured to receive an input image that depicts a face and, in response, provide a facial expression embedding that encodes information descriptive of a facial expression made by the face depicted in the input image;
  
  one or more processors; and
  
  one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors, cause the computer system to;
  
  obtain the input image that depicts the face;
  
  input the input image into the facial expression model;
  
  receive the facial expression embedding that encodes information descriptive of the facial expression made by the face from the facial expression model; and
  
  identify, as a search result in response to a search query associated with the input image and based at least in part on the facial expression embedding, at least one additional image that depicts a same facial expression as the facial expression made by the face depicted by the input image, wherein to identify the at least one additional image as the search result the computer system compares the facial expression embedding associated with the input image to a plurality of additional facial expression embeddings respectively associated with a plurality of candidate images that are potential search results.

13. A computer system, the computer system comprising:
- a facial expression model configured to receive an input image that depicts a face and, in response, provide a facial expression embedding that encodes information descriptive of a facial expression made by the face depicted in the input image;
  
  one or more processors; and
  
  one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors, cause the computer system to;
  
  obtain the input image that depicts the face;
  
  input the input image into the facial expression model;
  
  receive the facial expression embedding that encodes information descriptive of the facial expression made by the face from the facial expression model;
  
  determine a score for the input image based at least in part on the facial expression embedding, wherein the score for the input image is indicative of a desirability of the input image; and
  
  perform at least one of the following;
  
  select the input image as a best shot from a group of images based at least in part on the score determined for the input image relative to other scores determined for the group of images; and
  
  determine, based at least in part on the score determined for the input image, whether to store a non-temporary copy of the input image or to discard a temporary copy of the input image without storing the non-temporary copy of the input image.

14. A computer system, the computer system comprising:
- a facial expression model configured to receive an input image that depicts a face and, in response, provide a facial expression embedding that encodes information descriptive of a facial expression made by the face depicted in the input image;
  
  a generative neural network configured to receive the facial expression embedding and, in response, generate a synthesized image of a second face that has a same facial expression as the facial expression made by the face depicted by the input image;
  
  one or more processors; and
  
  one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors, cause the computer system to;
  
  obtain the input image that depicts the face;
  
  input the input image into the facial expression model;
  
  receive the facial expression embedding that encodes information descriptive of the facial expression made by the face from the facial expression model;
  
  input the facial expression embedding into the generative neural network; and
  
  receive the synthesized image of the second face that has the same facial expression as an output of the generative neural network.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Vemulapalli, Raviteja, Agarwala, Aseem
Primary Examiner(s)
Vu, Kim Y
Assistant Examiner(s)
Delaney, Molly

Application Number

US15/639,086
Publication Number

US 20190005313A1
Time in Patent Office

963 Days
Field of Search
US Class Current
CPC Class Codes

G06F 18/24323   Tree-organised classifiers

G06N 20/00   Machine learning

G06N 3/04   Architecture, e.g. intercon...

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/045   Combinations of networks

G06N 3/08   Learning methods

G06V 10/454   Integrating the filters int...

G06V 40/173   face re-identification, e.g...

G06V 40/174   Facial expression recognition

Compact language-free facial expression embedding and novel triplet training scheme

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Compact language-free facial expression embedding and novel triplet training scheme

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links