Machine learning systems and methods for augmenting images

US 10,529,137 B1
Filed: 11/29/2017
Issued: 01/07/2020
Est. Priority Date: 11/29/2016
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

a data repository storing an image augmentation rule that specifies a model pose of a human body, wherein the image augmentation rule defines at least a threshold value used to determine that an augmented image is to be generated, and wherein the model pose of a human body is represented by a first set of joint vertices; and

one or more processors in communication with the data repository, the one or more processors programmed with executable instructions to at least;

receive image data for an image depicting a scene including a human;

identify a second set of joint vertices representing a pose of the human in the image using a pose detection model to analyze the image data;

determine that a difference between the pose of the human in the image and the model pose specified in the image augmentation rule does not satisfy the threshold value defined in the augmentation rule, wherein the difference is determined based at least in part on a comparison of the first set of joint vertices and the second set of joint vertices; and

in response to determining that the difference does not satisfy the threshold value defined by the augmentation rule;

identify a shape of the human in the image using a shape detection model to analyze the image data;

identify semantic content of the scene in the image using a scene analysis model to analyze the image data; and

generate an augmented image of the human in the scene in which the shape of the human body is moved into the model pose and the semantic content of the scene is maintained with respect to the shape of the human body moved into the model pose.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed is a method including receiving visual input comprising a human within a scene, detecting a pose associated with the human using a trained machine learning model that detects human poses to yield a first output, estimating a shape (and optionally a motion) associated with the human using a trained machine learning model associated that detects shape (and optionally motion) to yield a second output, recognizing the scene associated with the visual input using a trained convolutional neural network which determines information about the human and other objects in the scene to yield a third output, and augmenting reality within the scene by leveraging one or more of the first output, the second output, and the third output to place 2D and/or 3D graphics in the scene.

83 Citations

View as Search Results

20 Claims

1. A system comprising:
- a data repository storing an image augmentation rule that specifies a model pose of a human body, wherein the image augmentation rule defines at least a threshold value used to determine that an augmented image is to be generated, and wherein the model pose of a human body is represented by a first set of joint vertices; and
  
  one or more processors in communication with the data repository, the one or more processors programmed with executable instructions to at least;
  
  receive image data for an image depicting a scene including a human;
  
  identify a second set of joint vertices representing a pose of the human in the image using a pose detection model to analyze the image data;
  
  determine that a difference between the pose of the human in the image and the model pose specified in the image augmentation rule does not satisfy the threshold value defined in the augmentation rule, wherein the difference is determined based at least in part on a comparison of the first set of joint vertices and the second set of joint vertices; and
  
  in response to determining that the difference does not satisfy the threshold value defined by the augmentation rule;
  
  identify a shape of the human in the image using a shape detection model to analyze the image data;
  
  identify semantic content of the scene in the image using a scene analysis model to analyze the image data; and
  
  generate an augmented image of the human in the scene in which the shape of the human body is moved into the model pose and the semantic content of the scene is maintained with respect to the shape of the human body moved into the model pose.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The system of claim 1, wherein the data repository stores a set of body models representing different shapes of human bodies, and wherein the one or more processors are programmed to use the shape detection model to correlate one body model of the body models with the shape of the human in the image data.
  - 3. The system of claim 2, wherein the body model is represented by a set of skin vertices, and wherein the one or more processors are programmed to move the shape of the human into the model pose by applying blend weights to the set of skin vertices.
  - 4. The system of claim 2, wherein the one or more processors are programmed to at least:
    - generate a texture map representing skin, hair, and clothing depicted on the human in the image based at least in part on pixel values in the image; and
      
      apply the texture map to the body model to render the human in the model pose.
  - 5. The system of claim 2, wherein the one or more processors are programmed to at least:
    - use the shape detection model to fit the body model to the locations of the second set of joint vertices representing the pose of the human in the image.
  - 6. The system of claim 5, wherein the pose analysis model comprises a convolutional neural network, wherein the shape detection model employs linear regression, and wherein the scene detection model comprises a convolutional neural network.
  - 7. The system of claim 1, wherein, to maintain the semantic content of the scene with respect to the shape of the human body moved into the model pose, the one or more processors are programmed to at least fill in estimated values of background pixels in locations originally occupied by the human.
  - 8. The system of claim 1, wherein the scene analysis model is trained to recognize a foreground object and wherein, to maintain the semantic content of the scene with respect to the shape of the human body moved into the model pose, the one or more processors are programmed to at least maintain the foreground object in front of the shape of the human body moved into the model pose.

9. A computer-implemented method comprising:
- receiving image data for an image depicting a scene including a human;
  
  accessing an image augmentation rule that specifies a model pose of a human body, wherein the image augmentation rule defines at least a threshold value used to determine that an augmented image is to be generated, and wherein the model pose of a human body is represented by a first set of joint vertices;
  
  identifying a second set of joint vertices representing a pose of the human in the image using a pose detection model applied to the image data;
  
  determining that a difference between the pose of the human in the image and the model pose specified in the image augmentation rule does not satisfy the threshold value defined in the augmentation rule, wherein the difference is determined based at least in part on a comparison of the first set of joint vertices and the second set of joint vertices; and
  
  in response to determining that the difference does not satisfy the threshold value defined by the augmentation rule;
  
  using a shape detection model to identify a shape of the human in the image;
  
  using a scene analysis model to identify semantic content of the scene in the image; and
  
  generating an augmented image of the human in the scene in which the shape of the human body is moved into the model pose and the semantic content of the scene is maintained with respect to the shape of the human body moved into the model pose.
- View Dependent Claims (10, 11, 12, 13)
- - 10. The computer-implemented method of claim 9, further comprising:
    - accessing a data repository storing a set of body models representing different shapes of human bodies; and
      
      using the shape detection model to correlate one body model of the body models with the shape of the human in the image data.
  - 11. The computer-implemented method of claim 10, further comprising:
    - using a convolutional neural network as the pose detection model to identify the second set of joint vertices representing the pose of the human depicted in the image; and
      
      using linear regression as the shape detection model to fit the body model to the second set of joint vertices.
  - 12. The computer-implemented method of claim 10, wherein the body model is represented by a set of skin vertices, the computer-implemented method further comprising moving the shape of the human into the model pose by applying blend weights to the set of skin vertices.
  - 13. The computer-implemented method of claim 9, wherein, to maintain the semantic content of the scene with respect to the shape of the human body moved into the model pose, the computer-implemented method further comprises one or both of filling in estimated values of background pixels in locations originally occupied by the human and maintaining a foreground object in front of the shape of the human body moved into the model pose.

14. A non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to at least:
- receive image data for an image depicting a scene including a subject;
  
  access an image augmentation rule that specifies a model pose of a subject body, wherein the image augmentation rule defines at least a threshold value used to determine that an augmented image is to be generated, and wherein the model pose of a subject body is represented by a first set of joint vertices;
  
  identify a second set of joint vertices representing a pose of the subject in the image using a pose detection model to analyze the image data;
  
  use a shape detection model to identify a shape of the subject in the image;
  
  use a scene analysis model to identify semantic content of the scene in the image;
  
  determine a difference between the model pose of a subject body and the pose of the subject identified from the image data does not satisfy the threshold value defined in the augmentation rule, wherein the difference is determined based at least in part on a comparison of the first set of joint vertices and the second set of joint vertices; and
  
  in response to determining that the difference does not satisfy the threshold value defined by the augmentation rule;
  
  generate, based at least in part on the comparison, an augmented image of the subject in the scene, wherein generating the augmented image comprises;
  
  morph the shape of the subject in the scene into a new shape reflecting the model pose of the subject body;
  
  modify the semantic content of the scene in accordance with the new shape; and
  
  store image data for the augmented image that includes the subject repositioned in the new shape and the semantic content modified in accordance with the new shape.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The non-transitory computer-readable medium of claim 14 storing instructions that, when executed by the one or more processors, further cause the one or more processors to at least:
    - access a data repository storing a set of body models representing different shapes of subject bodies; and
      
      use the shape detection model to correlate a body model of the body models with the shape of the subject in the image data.
  - 16. The non-transitory computer-readable medium of claim 15 storing instructions that, when executed by the one or more processors, further cause the one or more processors to at least:
    - use a convolutional neural network as the pose detection model to identify the second set of joint vertices representing the pose of the subject depicted in the image; and
      
      use linear regression as the shape detection model to fit the body model to the second set of joint vertices.
  - 17. The non-transitory computer-readable medium of claim 15, wherein the body model is represented by a set of skin vertices, and wherein the non-transitory computer-readable medium stores instructions that, when executed by the one or more processors, further cause the one or more processors to morph the shape of the subject in the scene into the new shape by applying blend weights to the set of skin vertices of the body model.
  - 18. The non-transitory computer-readable medium of claim 17 storing instructions that, when executed by the one or more processors, further cause the one or more processors to at least:
    - use pixel values of the image to generate a texture map representing skin, hair, and clothing depicted on the subject in the scene; and
      
      apply the texture map to the body model to render the subject in the model pose.
  - 19. The non-transitory computer-readable medium of claim 17 storing instructions that, when executed by the one or more processors, further cause the one or more processors to at least:
    - use pixel values of the image to generate a displacement map representing contours of a body surface of the subject in the scene; and
      
      apply the displacement map to the body model to render the contours of the body surface in the model pose.
  - 20. The non-transitory computer-readable medium of claim 14, wherein to modify the semantic content of the scene in accordance with the new shape, the non-transitory computer-readable medium stores instructions that, when executed by the one or more processors, cause the one or more processors to conduct one or both of filling in estimated values of background pixels in locations originally occupied by the subject and maintaining a foreground object in front of the shape of the subject body moved into the model pose.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Max-Planck-Gesellschaft Zur Foerderung Der Wissenschaften e.V. (Max-Planck-Gesellschaft zur Frderung der Wissenschaften e)
Original Assignee
Max-Planck-Gesellschaft Zur Foerderung Der Wissenschaften e.V. (Max-Planck-Gesellschaft zur Frderung der Wissenschaften e)
Inventors
Black, Michael, Rachlin, Eric, Lee, Evan, Heron, Nicolas, Loper, Matthew, Weiss, Alexander, Smith, David
Primary Examiner(s)
Johnson, Motilewa Good

Application Number

US15/826,389
Time in Patent Office

769 Days
Field of Search
US Class Current
CPC Class Codes

G06N 20/00   Machine learning

G06N 3/045   Combinations of networks

G06N 3/08   Learning methods

G06T 13/40   of characters, e.g. humans,...

G06T 15/04   Texture mapping

G06T 19/006   Mixed reality object pose d...

G06T 19/20   Editing of 3D images, e.g. ...

G06T 2207/20081   Training; Learning

G06T 2207/20084   Artificial neural networks ...

G06T 2207/20221   Image fusion; Image merging

G06T 2207/30196   Human being; Person

G06T 2219/2021   Shape modification

G06T 7/246   using feature-based methods...

G06T 7/55   from multiple images

G06T 7/75   involving models

G06V 10/255   Detecting or recognising po...

G06V 10/454   Integrating the filters int...

G06V 10/54   relating to texture

G06V 10/7553   based on shape, e.g. active...

G06V 10/82   using neural networks

G06V 20/653 : by matching three-dimension...

G06V 30/19147 : Obtaining sets of training ...

G06V 30/19173 : Classification techniques

G06V 30/274 : Syntactic or semantic conte...

G06V 40/103 : Static body considered as a...

View All

Machine learning systems and methods for augmenting images

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

83 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Machine learning systems and methods for augmenting images

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

83 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links