Compact language-free facial expression embedding and novel triplet training scheme
First Claim
1. A computer system, the computer system comprising:
- a facial expression model configured to receive an input image that depicts a face and, in response, provide a facial expression embedding that encodes information descriptive of a facial expression made by the face depicted in the input image;
one or more processors; and
one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors, cause the computer system to;
obtain the input image that depicts the face;
input the input image into the facial expression model; and
receive the facial expression embedding that encodes information descriptive of the facial expression made by the face from the facial expression model;
wherein the facial expression model has been trained on a training dataset that comprises a plurality of images organized into triplets of images;
wherein each triplet of images comprises a label that indicates which two of three images included in such triplet of images have been assessed to be a most similar pair of images within such triplet of images;
wherein each triplet of images comprises a first image, a second image, and a third image, wherein the first image and the second image have been assessed to be the most similar pair of images within the triplet of images;
wherein the facial expression model has been trained with an objective function that encodes both a first constraint and a second constraint;
wherein the first constraint comprises a first requirement that a first distance between a first embedding provided for the first image by the facial expression model and a second embedding provided for the second image by the facial expression model is less than a second distance between the first embedding and a third embedding provided for the third image by the facial expression model; and
wherein the second constraint comprises a second requirement that the first distance between the first embedding and the second embedding is less than a third distance between the second embedding provided for the second image by the facial expression model and the third embedding provided for the third image by the facial expression model.
3 Assignments
0 Petitions
Accused Products
Abstract
The present disclosure provides systems and methods that include or otherwise leverage use of a facial expression model that is configured to provide a facial expression embedding. In particular, the facial expression model can receive an input image that depicts a face and, in response, provide a facial expression embedding that encodes information descriptive of a facial expression made by the face depicted in the input image. As an example, the facial expression model can be or include a neural network such as a convolutional neural network. The present disclosure also provides a novel and unique triplet training scheme which does not rely upon designation of a particular image as an anchor or reference image.
-
Citations
14 Claims
-
1. A computer system, the computer system comprising:
-
a facial expression model configured to receive an input image that depicts a face and, in response, provide a facial expression embedding that encodes information descriptive of a facial expression made by the face depicted in the input image; one or more processors; and one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors, cause the computer system to; obtain the input image that depicts the face; input the input image into the facial expression model; and receive the facial expression embedding that encodes information descriptive of the facial expression made by the face from the facial expression model; wherein the facial expression model has been trained on a training dataset that comprises a plurality of images organized into triplets of images; wherein each triplet of images comprises a label that indicates which two of three images included in such triplet of images have been assessed to be a most similar pair of images within such triplet of images; wherein each triplet of images comprises a first image, a second image, and a third image, wherein the first image and the second image have been assessed to be the most similar pair of images within the triplet of images; wherein the facial expression model has been trained with an objective function that encodes both a first constraint and a second constraint; wherein the first constraint comprises a first requirement that a first distance between a first embedding provided for the first image by the facial expression model and a second embedding provided for the second image by the facial expression model is less than a second distance between the first embedding and a third embedding provided for the third image by the facial expression model; and wherein the second constraint comprises a second requirement that the first distance between the first embedding and the second embedding is less than a third distance between the second embedding provided for the second image by the facial expression model and the third embedding provided for the third image by the facial expression model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. One or more non-transitory computer-readable media that collectively store instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:
training a facial expression model to provide language-free facial expression embeddings descriptive of facial expressions made by faces included in input images, wherein training the facial expression model comprises; obtaining a training dataset that comprises a plurality of images organized into triplets of images, wherein each triplet of images comprises a label that indicates that a first image and a second image included in such triplet of images have been assessed to be a most similar pair of images within such triplet of images; and training the facial expression model using an objective function that encodes both a first constraint and a second constraint; wherein, for each triplet of images, the first constraint comprises a first requirement that a first distance between a first embedding provided for the first image by the facial expression model and a second embedding provided for the second image by the facial expression model is less than a second distance between the first embedding and a third embedding provided by the facial expression model for a third image included in such triplet of images; and wherein, for each triplet of images, the second constraint comprises a second requirement that the first distance between the first embedding and the second embedding is less than a third distance between the second embedding provided for the second image by the facial expression model and the third embedding provided for the third image by the facial expression model.
-
12. A computer system, the computer system comprising:
-
a facial expression model configured to receive an input image that depicts a face and, in response, provide a facial expression embedding that encodes information descriptive of a facial expression made by the face depicted in the input image; one or more processors; and one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors, cause the computer system to; obtain the input image that depicts the face; input the input image into the facial expression model; receive the facial expression embedding that encodes information descriptive of the facial expression made by the face from the facial expression model; and identify, as a search result in response to a search query associated with the input image and based at least in part on the facial expression embedding, at least one additional image that depicts a same facial expression as the facial expression made by the face depicted by the input image, wherein to identify the at least one additional image as the search result the computer system compares the facial expression embedding associated with the input image to a plurality of additional facial expression embeddings respectively associated with a plurality of candidate images that are potential search results.
-
-
13. A computer system, the computer system comprising:
-
a facial expression model configured to receive an input image that depicts a face and, in response, provide a facial expression embedding that encodes information descriptive of a facial expression made by the face depicted in the input image; one or more processors; and one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors, cause the computer system to; obtain the input image that depicts the face; input the input image into the facial expression model; receive the facial expression embedding that encodes information descriptive of the facial expression made by the face from the facial expression model; determine a score for the input image based at least in part on the facial expression embedding, wherein the score for the input image is indicative of a desirability of the input image; and perform at least one of the following; select the input image as a best shot from a group of images based at least in part on the score determined for the input image relative to other scores determined for the group of images; and determine, based at least in part on the score determined for the input image, whether to store a non-temporary copy of the input image or to discard a temporary copy of the input image without storing the non-temporary copy of the input image.
-
-
14. A computer system, the computer system comprising:
-
a facial expression model configured to receive an input image that depicts a face and, in response, provide a facial expression embedding that encodes information descriptive of a facial expression made by the face depicted in the input image; a generative neural network configured to receive the facial expression embedding and, in response, generate a synthesized image of a second face that has a same facial expression as the facial expression made by the face depicted by the input image; one or more processors; and one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors, cause the computer system to; obtain the input image that depicts the face; input the input image into the facial expression model; receive the facial expression embedding that encodes information descriptive of the facial expression made by the face from the facial expression model; input the facial expression embedding into the generative neural network; and receive the synthesized image of the second face that has the same facial expression as an output of the generative neural network.
-
Specification