Utilizing deep learning for automatic digital image segmentation and stylization

US 9,773,196 B2
Filed: 01/25/2016
Issued: 09/26/2017
Est. Priority Date: 01/25/2016
Status: Active Grant

First Claim

Patent Images

1. In a digital medium environment for editing digital visual media, a method of using deep learning to automatically select individuals portrayed in the digital visual media, the method comprising:

training, by at least one processor, a neural network utilizing training input generated from a repository of digital training images;

generating, by the at least one processor, with regard to a probe digital image portraying a target individual, a position channel that indicates positions of pixels in the probe digital image relative to the target individual portrayed in the probe digital image by determining a transform between one or more feature points of the target individual and a canonical pose; and

identifying, by the at least one processor, a set of pixels representing the target individual in the probe digital image utilizing the trained neural network and the position channel.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods are disclosed for segregating target individuals represented in a probe digital image from background pixels in the probe digital image. In particular, in one or more embodiments, the disclosed systems and methods train a neural network based on two or more of training position channels, training shape input channels, training color channels, or training object data. Moreover, in one or more embodiments, the disclosed systems and methods utilize the trained neural network to select a target individual in a probe digital image. Specifically, in one or more embodiments, the disclosed systems and methods generate position channels, training shape input channels, and color channels corresponding the probe digital image, and utilize the generated channels in conjunction with the trained neural network to select the target individual.

Citations

20 Claims

1. In a digital medium environment for editing digital visual media, a method of using deep learning to automatically select individuals portrayed in the digital visual media, the method comprising:
- training, by at least one processor, a neural network utilizing training input generated from a repository of digital training images;
  
  generating, by the at least one processor, with regard to a probe digital image portraying a target individual, a position channel that indicates positions of pixels in the probe digital image relative to the target individual portrayed in the probe digital image by determining a transform between one or more feature points of the target individual and a canonical pose; and
  
  identifying, by the at least one processor, a set of pixels representing the target individual in the probe digital image utilizing the trained neural network and the position channel.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein:
    - the training input comprises a plurality of training position channels, a plurality of training shape input channels, and a plurality of training color channels corresponding to the digital training images, wherein each training color channel in the plurality of training color channels reflect colors of pixels in a corresponding digital training image;
      
      the method further comprises generating a color channel that reflects colors of pixels in the digital training image; and
      
      the method identifies the set of pixels representing the target individual by utilizing the color channel.
  - 3. The method of claim 2, further comprising generating the training input by:
    - identifying a training target individual portrayed in each digital training image;
      
      generating a training position channel for each digital training image that indicates positions of pixels in the digital training image relative to the identified training target individual portrayed in the training digital image; and
      
      generating a training shape input channel for each digital training image that comprises an estimated shape of the identified target individual portrayed in the digital training image.
  - 4. The method of claim 1, wherein generating the position channel further comprises:
    - generating an x-position channel that indicates horizontal positions of pixels in the probe digital image relative to a face of the target individual portrayed in the probe digital image; and
      
      generating a y-position channel that indicates vertical positions of pixels in the probe digital image relative to the face of the target individual portrayed in the probe digital image.
  - 5. The method of claim 1, wherein generating the position channel further comprises:
    - detecting one or more facial feature points corresponding to a face of the target individual portrayed in the probe digital image;
      
      estimating the transform between the detected one or more facial feature points and the canonical pose, wherein the canonical pose comprises template facial features; and
      
      applying the transform to the canonical pose to generate the position channel.
  - 6. The method of claim 5, wherein the position channel expresses the position of pixels in the canonical pose in a coordinate system that is centered on the face and scaled according to a size of the face.
  - 7. The method of claim 1, further comprising:
    - generating a shape input channel that comprises an estimated shape of the target individual by;
      
      generating a mean digital object mask based on target individuals in a plurality of digital images, the mean digital object mask comprising a shape corresponding to the target individuals in the plurality of digital images; and
      
      utilizing the mean digital object mask to generate the shape input channel; and
      
      identifying the set of pixels representing the target individual in the probe digital image utilizing the trained neural network, the position channel, and the shape input channel.
  - 8. The method of claim 7, wherein generating the mean digital object mask comprises:
    - identifying a set of pixels representing a target individual in a digital image from the plurality of digital images;
      
      identifying one or more facial feature points corresponding to the target individual in the digital image;
      
      estimating a first transform between the facial feature points and a canonical pose; and
      
      applying the first transform to the set of pixels representing the target individual in the digital image.
  - 9. The method of claim 8, wherein utilizing the mean digital object mask to generate the shape input channel comprises:
    - detecting one or more facial feature points corresponding to the target individual portrayed in the probe digital image;
      
      estimating a second transform based on the detected one or more facial feature points corresponding to the target individual portrayed in the probe digital image and the canonical pose; and
      
      applying the second transform to the mean digital object mask to generate the shape input channel.
  - 10. The method of claim 1, further comprising modifying the probe digital image based on the set of pixels representing the target individual in the probe digital image by applying one or more of:
    - a first image filter to the set of pixels representing the target individual in the probe digital image;
      
      ora second image filter to other pixels in the probe digital image.
  - 11. The method of claim 1, wherein identifying, by the at least one processor, the set of pixels representing the target individual in the probe digital image further comprises:
    - upon receiving a request to select the target individual in the probe digital image, automatically identifying the set of pixels representing the target individual in the probe digital image without additional user input.

12. In a digital medium environment for editing digital visual media, a method of using deep learning to automatically select individuals portrayed in the digital visual media, the method comprising:
- accessing;
  
  a trained neural network generated from a repository of digital training images, wherein each of the digital training images portrays a training target individual, anda mean digital object mask reflecting a shape based on each of the training target individuals portrayed in the digital training images;
  
  generating, with regard to a probe digital image by at least one processor and utilizing the mean digital object mask, a shape input channel comprising an estimated shape of a target individual based on the mean digital object mask by estimating a transform between one or more facial feature points corresponding to the target individual portrayed in the probe digital image and a canonical pose; and
  
  identifying, by the at least one processor, a set of pixels representing the target individual in the probe digital image utilizing the trained neural network and the generated shape input channel.
- View Dependent Claims (13, 14, 15, 16)
- - 13. The method of claim 12, wherein utilizing the mean digital object mask to generate the shape input channel comprises:
    - detecting the one or more facial feature points corresponding to the target individual portrayed in the probe digital image;
      
      estimating the transform based on the detected one or more facial feature points corresponding to the target individual portrayed in the probe digital image and the canonical pose, wherein the canonical pose comprises template facial features; and
      
      applying the transform to the mean digital object mask to generate the shape input channel.
  - 14. The method of claim 12, further comprising generating a position channel, wherein the position channel indicates a position of pixels in the probe digital image relative to a face of the target individual portrayed in the probe digital image.
  - 15. The method of claim 14, wherein generating the position channel further comprises:
    - generating an x-position channel that indicates horizontal positions of pixels in the probe digital image relative to a face of the target individual portrayed in the probe digital image; and
      
      generating a y-position channel that indicates vertical positions of pixels in the probe digital image relative to the face of the target individual portrayed in the probe digital image.
  - 16. The method of claim 14, wherein generating the position channel further comprises:
    - detecting facial feature points corresponding to the target individual portrayed in the probe digital image;
      
      estimating a second transform between the detected facial feature points and a canonical pose, wherein the canonical pose comprises template facial features; and
      
      applying the second transform to the canonical pose to generate the position channel.

17. A system for identifying target objects within digital visual media, comprising:
- at least one processor; and
  
  at least one non-transitory computer readable storage medium storing instructions thereon, that, when executed by the at least one processor, cause the system to;
  
  generate a plurality of training color channels and a plurality of training shape input channels with regard to a plurality of digital training digital images, wherein each digital training image portrays a target individual;
  
  train a neural network utilizing the plurality of training color channels and the plurality of training shape input channels;
  
  generate a color channel and a shape input channel with regard to a probe digital image, wherein the probe digital image portrays a target individual, the color channel reflects colors of pixels in the digital training image and the shape input channel comprises an estimated shape of the target individual based on a transform between one or more features of the target individual and a canonical pose; and
  
  identify a set of pixels representing the target individual in the probe digital image utilizing the trained neural network, the generated color channel, and the generated shape input channel.
- View Dependent Claims (18, 19, 20)
- - 18. The system of claim 17, further comprising instructions that, when executed by the at least one processor, cause the system to:
    - generate a plurality of training position channels;
      
      train the neural network utilizing the plurality of training position channels;
      
      generate a position channel with regard to the probe digital image, wherein the position channel indicates positions of pixels in the probe digital image relative to the target individual portrayed in the probe digital image, andidentify the set of pixels representing the target individual in the probe digital image utilizing the generated position channel.
  - 19. The system of claim 18, wherein the instructions, when executed by the at least one processor, cause the system to generate the plurality of training position channels and the plurality of training shape input channels by performing steps comprising:
    - generating a training position channel for each digital training image in the plurality of digital training images, wherein the training position channel for each digital training image indicates positions of pixels in the digital training image relative to the training target individual portrayed in the training digital image; and
      
      generating a training shape input channel for each digital training image in the plurality of digital training images, wherein the training shape input channel for each digital training image comprises an estimated shape of the identified target individual portrayed in the training digital image.
  - 20. The system of claim 17, wherein the instructions, when executed by the at least one processor, cause the system to generate the shape input channel by performing steps comprising:
    - generating a mean digital object mask from a plurality of digital images, wherein each of the plurality of digital images portray a target individual, and the mean digital object mask comprises a shape corresponding to the target individuals portrayed in the plurality of digital images; and
      
      utilizing the mean digital object mask to generate the shape input channel.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Adobe Inc.
Original Assignee
Adobe Systems Incorporated (Adobe Inc.)
Inventors
Sachs, Ian, Shen, Xiaoyong, Paris, Sylvain, Hertzmann, Aaron, Shechtman, Elya, Price, Brian
Primary Examiner(s)
Tran, Phuoc

Application Number

US15/005,855
Publication Number

US 20170213112A1
Time in Patent Office

610 Days
Field of Search
US Class Current
CPC Class Codes

G06N 3/045   Combinations of networks

G06N 3/08   Learning methods

G06T 2207/20084   Artificial neural networks ...

G06T 2207/30201   Face

G06T 7/11   Region-based segmentation

G06T 7/73   using feature-based methods

G06T 7/90   Determination of colour cha...

G06V 40/161   Detection; Localisation; No...

G06V 40/169   Holistic features and repre...

Utilizing deep learning for automatic digital image segmentation and stylization

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Utilizing deep learning for automatic digital image segmentation and stylization

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links