Machine learning systems and methods for augmenting images
First Claim
1. A system comprising:
- a data repository storing an image augmentation rule that specifies a model pose of a human body, wherein the image augmentation rule defines at least a threshold value used to determine that an augmented image is to be generated, and wherein the model pose of a human body is represented by a first set of joint vertices; and
one or more processors in communication with the data repository, the one or more processors programmed with executable instructions to at least;
receive image data for an image depicting a scene including a human;
identify a second set of joint vertices representing a pose of the human in the image using a pose detection model to analyze the image data;
determine that a difference between the pose of the human in the image and the model pose specified in the image augmentation rule does not satisfy the threshold value defined in the augmentation rule, wherein the difference is determined based at least in part on a comparison of the first set of joint vertices and the second set of joint vertices; and
in response to determining that the difference does not satisfy the threshold value defined by the augmentation rule;
identify a shape of the human in the image using a shape detection model to analyze the image data;
identify semantic content of the scene in the image using a scene analysis model to analyze the image data; and
generate an augmented image of the human in the scene in which the shape of the human body is moved into the model pose and the semantic content of the scene is maintained with respect to the shape of the human body moved into the model pose.
2 Assignments
0 Petitions
Accused Products
Abstract
Disclosed is a method including receiving visual input comprising a human within a scene, detecting a pose associated with the human using a trained machine learning model that detects human poses to yield a first output, estimating a shape (and optionally a motion) associated with the human using a trained machine learning model associated that detects shape (and optionally motion) to yield a second output, recognizing the scene associated with the visual input using a trained convolutional neural network which determines information about the human and other objects in the scene to yield a third output, and augmenting reality within the scene by leveraging one or more of the first output, the second output, and the third output to place 2D and/or 3D graphics in the scene.
83 Citations
20 Claims
-
1. A system comprising:
-
a data repository storing an image augmentation rule that specifies a model pose of a human body, wherein the image augmentation rule defines at least a threshold value used to determine that an augmented image is to be generated, and wherein the model pose of a human body is represented by a first set of joint vertices; and one or more processors in communication with the data repository, the one or more processors programmed with executable instructions to at least; receive image data for an image depicting a scene including a human; identify a second set of joint vertices representing a pose of the human in the image using a pose detection model to analyze the image data; determine that a difference between the pose of the human in the image and the model pose specified in the image augmentation rule does not satisfy the threshold value defined in the augmentation rule, wherein the difference is determined based at least in part on a comparison of the first set of joint vertices and the second set of joint vertices; and in response to determining that the difference does not satisfy the threshold value defined by the augmentation rule; identify a shape of the human in the image using a shape detection model to analyze the image data; identify semantic content of the scene in the image using a scene analysis model to analyze the image data; and generate an augmented image of the human in the scene in which the shape of the human body is moved into the model pose and the semantic content of the scene is maintained with respect to the shape of the human body moved into the model pose. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-implemented method comprising:
-
receiving image data for an image depicting a scene including a human; accessing an image augmentation rule that specifies a model pose of a human body, wherein the image augmentation rule defines at least a threshold value used to determine that an augmented image is to be generated, and wherein the model pose of a human body is represented by a first set of joint vertices; identifying a second set of joint vertices representing a pose of the human in the image using a pose detection model applied to the image data; determining that a difference between the pose of the human in the image and the model pose specified in the image augmentation rule does not satisfy the threshold value defined in the augmentation rule, wherein the difference is determined based at least in part on a comparison of the first set of joint vertices and the second set of joint vertices; and in response to determining that the difference does not satisfy the threshold value defined by the augmentation rule; using a shape detection model to identify a shape of the human in the image; using a scene analysis model to identify semantic content of the scene in the image; and generating an augmented image of the human in the scene in which the shape of the human body is moved into the model pose and the semantic content of the scene is maintained with respect to the shape of the human body moved into the model pose. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to at least:
-
receive image data for an image depicting a scene including a subject; access an image augmentation rule that specifies a model pose of a subject body, wherein the image augmentation rule defines at least a threshold value used to determine that an augmented image is to be generated, and wherein the model pose of a subject body is represented by a first set of joint vertices; identify a second set of joint vertices representing a pose of the subject in the image using a pose detection model to analyze the image data; use a shape detection model to identify a shape of the subject in the image; use a scene analysis model to identify semantic content of the scene in the image; determine a difference between the model pose of a subject body and the pose of the subject identified from the image data does not satisfy the threshold value defined in the augmentation rule, wherein the difference is determined based at least in part on a comparison of the first set of joint vertices and the second set of joint vertices; and in response to determining that the difference does not satisfy the threshold value defined by the augmentation rule; generate, based at least in part on the comparison, an augmented image of the subject in the scene, wherein generating the augmented image comprises; morph the shape of the subject in the scene into a new shape reflecting the model pose of the subject body; modify the semantic content of the scene in accordance with the new shape; and store image data for the augmented image that includes the subject repositioned in the new shape and the semantic content modified in accordance with the new shape. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification