Compact Language-Free Facial Expression Embedding and Novel Triplet Training Scheme
First Claim
1. A computer system, the computer system comprising:
- a facial expression model configured to receive an input image that depicts a face and, in response, provide a facial expression embedding that encodes information descriptive of a facial expression made by the face depicted in the input image;
one or more processors; and
one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors, cause the computer system to;
obtain the input image that depicts the face;
input the input image into the facial expression model; and
receive the facial expression embedding that encodes information descriptive of the facial expression made by the face from the facial expression model.
3 Assignments
0 Petitions
Accused Products
Abstract
The present disclosure provides systems and methods that include or otherwise leverage use of a facial expression model that is configured to provide a facial expression embedding. In particular, the facial expression model can receive an input image that depicts a face and, in response, provide a facial expression embedding that encodes information descriptive of a facial expression made by the face depicted in the input image. As an example, the facial expression model can be or include a neural network such as a convolutional neural network. The present disclosure also provides a novel and unique triplet training scheme which does not rely upon designation of a particular image as an anchor or reference image.
-
Citations
20 Claims
-
1. A computer system, the computer system comprising:
-
a facial expression model configured to receive an input image that depicts a face and, in response, provide a facial expression embedding that encodes information descriptive of a facial expression made by the face depicted in the input image; one or more processors; and one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors, cause the computer system to; obtain the input image that depicts the face; input the input image into the facial expression model; and receive the facial expression embedding that encodes information descriptive of the facial expression made by the face from the facial expression model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computer-implemented method for training a facial expression model, the method comprising:
-
receiving training data comprising plural triplets of images that depict faces and similarity information corresponding to each triplet, wherein the similarity information for each triplet indicates which two images of such triplet have been determined to be the most similar pair of images in terms of facial expressions, wherein none of the images is indicated to be a reference image against which the other two images of the triplet have been compared; and training the facial expression model using the training data such that a distance in an embedding space between the most similar pair of images is less than respective distances between each image of the pair and the third image of the triplet. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. One or more non-transitory computer-readable media that collectively store instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:
training a facial expression model to provide language-free facial expression embeddings descriptive of facial expressions made by faces included in input images, wherein training the facial expression model comprises; obtaining a training dataset that comprises a plurality of images organized into triplets of images, wherein each triplet of images comprises a label that indicates that a first image and a second image included in such triplet of images have been assessed to be a most similar pair of images within such triplet of images; and training the facial expression model using an objective function that encodes both a first constraint and a second constraint; wherein, for each triplet of images, the first constraint comprises a first requirement that a first distance between a first embedding provided for the first image by the facial expression model and a second embedding provided for the second image by the facial expression model is less than a second distance between the first embedding and a third embedding provided by the facial expression model for a third image included in such triplet of images; and wherein, for each triplet of images, the second constraint comprises a second requirement that the first distance between the first embedding and the second embedding is less than a third distance between the second embedding provided for the second image by the facial expression model and the third embedding provided for the third image by the facial expression model.
Specification