POSE-ALIGNED NETWORKS FOR DEEP ATTRIBUTE MODELING
First Claim
1. A computing device for recognizing human attribute from digital images, comprising:
- one or more processing units;
an input interface configured to receive a digital image;
a storage component configured to store multiple body feature databases associated with multiple recognized body features, trained data of multiple deep learning networks associated with the recognized body features, and trained data of a classification engine for an image attribute, wherein the image attribute relates to a content of the digital image;
a body feature module configured to locate multiple part patches from the digital image based on the body feature databases;
an artificial neural network module configured to feed the part patches into the deep learning networks to generate multiple sets of feature data; and
a classification module configured to concatenate the sets of feature data and feed the concatenated feature data into the classification engine to determine whether the digital image has the image attribute.
2 Assignments
0 Petitions
Accused Products
Abstract
Technology is disclosed for inferring human attributes from images of people. The attributes can include, for example, gender, age, hair, and/or clothing. The technology uses part-based models, e.g., Poselets, to locate multiple normalized part patches from an image. The normalized part patches are provided into trained convolutional neural networks to generate feature data. Each convolution neural network applies multiple stages of convolution operations to one part patch to generate a set of fully connected feature data. The feature data for all part patches are concatenated and then provided into multiple trained classifiers (e.g., linear support vector machines) to predict attributes of the image.
-
Citations
33 Claims
-
1. A computing device for recognizing human attribute from digital images, comprising:
-
one or more processing units; an input interface configured to receive a digital image; a storage component configured to store multiple body feature databases associated with multiple recognized body features, trained data of multiple deep learning networks associated with the recognized body features, and trained data of a classification engine for an image attribute, wherein the image attribute relates to a content of the digital image; a body feature module configured to locate multiple part patches from the digital image based on the body feature databases; an artificial neural network module configured to feed the part patches into the deep learning networks to generate multiple sets of feature data; and a classification module configured to concatenate the sets of feature data and feed the concatenated feature data into the classification engine to determine whether the digital image has the image attribute.
-
-
2. The computing device of claim 1, further comprising:
an output interface configured to present a signal indicating whether the digital image has the image attribute.
-
3. The computing device of claim 1, wherein at least one of the deep learning networks is a convolutional neural network that comprises multiple down-sampling operations and multiple convolution operations using multiple convolution filters to detect local spatial correlations in the part patches.
-
4. The computing device of claim 1, wherein one individual part patch among the part patches is located using a body feature database associated with a specific body feature among the recognized body features, and the individual part patch is provided into a deep learning network associated with the specific body feature.
-
5. The computing device of claim 1, further comprising:
a whole body module configured to; locate a whole-body portion from the digital image, wherein the whole-body portion covers an entire human body depicted in the digital image; feed the whole-body portion into a deep neural network to generate a set of whole-body feature data; and incorporate the set of whole-body feature data into the concatenated feature data.
-
6. The computing device of claim 1, wherein the body feature module is configured to scan the digital image using multiple windows having various sizes;
- and
compare scanned portions of the digital image confined by the windows with multiple training patches from the body feature databases; wherein the training patches are annotated with keypoints of body parts and the body feature databases contain the training patches that form a cluster in a three-dimensional (3D) configuration space corresponding to a recognized human body portion or pose.
- and
-
7. The computing device of claim 3, wherein at least one of the convolution operations uses multiple filters having dimensions of more than one.
-
8. The computing device of claim 1, wherein the convolution filters detect spatially local correlations present in the part patches.
-
9. The computing device of claim 3, wherein the convolutional neural network applies a normalization operation to the part patch after one of the multiple convolution operations has been applied to the part patch.
-
10. The computing device of claim 3, wherein at least two convolutional neural networks apply a max-pooling operation to the part patch after one of the convolution operations has been applied to the part patch.
-
11. The computing device of claim 1, wherein the body feature module is configured to resize the part patches to a common resolution, where the common resolution is a required resolution for inputs of the deep learning networks.
-
12. The computing device of claim 1, wherein the body feature module is configured to break down the part patches into three layers based on the red, green and blue channels of the part patches.
-
13. The computing device of claim 1, wherein the body feature module is configured to:
-
locate a whole-body portion from the digital image, wherein the whole-body portion covers an entire human body depicted in the digital image; feed the whole-body portion into a deep neural network to generate a set of whole-body feature data; and incorporate the set of whole-body feature data into the set of concatenated feature data.
-
-
14. The computing device of claim 1, wherein the classification module is configured to generate a prediction score indicating the likelihood of the human attribute existing in the digital image.
-
15. The computing device of claim 1, wherein the human attribute comprises gender, age, race, hair or clothing.
-
16. The computing device of claim 1, wherein the classification module comprises a linear support vector machine that is trained using training data associated with the human attribute.
-
17. A processor-executable storage medium storing instructions, comprising:
-
instructions for locating multiple image patches from a digital image, wherein the image patches comprise portions of the digital image corresponding to recognized human body features; instructions for feeding the image patches into multiple neural networks corresponding to the recognized human body features to generate feature data; instructions for concatenating the feature data from the neural networks and feeding the feature data into an image attribute classifier associated with an image attribute; and instructions for determining whether the digital image has the image attribute based on a result generated from the feature data by the image attributed classifier.
-
-
18. The processor-executable storage medium of claim 17, wherein the recognized human body feature corresponds to a human body portion or pose from a specific viewpoint.
-
19. The processor-executable storage medium of claim 17, wherein the instructions for feeding the image patches comprise:
instructions for feeding one of the image patches associated with one specific human body feature among the recognized human body features, into one of the neural networks associated with the specific human body feature.
-
20. The processor-executable storage medium of claim 17, further comprising:
instructions for resizing the image patches to a common resolution and dithering the image patches.
-
21. A method, performed by a computing device having one or more processing units, for recognizing human attributes from digital images, comprising:
-
locating, by the one or more processing units, at least two part patches from a digital image, wherein each of the two part patches comprises at least a portion of the digital image corresponding to a recognized human body portion or pose; providing each of the part patches as an input to one of multiple convolutional neural networks; for at least two selected convolutional neural networks among the multiple convolutional neural networks, applying multiple stages of convolution operations to a part patch associated with the selected convolutional neural networks to generate a set of feature data as an output of the selected convolutional neural networks; concatenating the sets of feature data from the at least two convolutional neural networks to generate a set of concatenated feature data; feeding the set of concatenated feature data into a classification engine for predicting a human attribute; and determining, based on a result provided by the classification engine, whether a human attribute exists in the digital image.
-
-
22. The method of claim 21, wherein said locating comprises:
-
scanning the digital image using multiple windows having various sizes; and comparing scanned portions of the digital image confined by the windows with multiple training patches from a database; wherein the training patches are annotated with keypoints of body parts and the database contains the training patches that form a cluster in a 3D configuration space corresponding to a recognized human body portion or pose.
-
-
23. The method of claim 21, wherein one of the convolution operations uses multiple filters having dimensions of more than one.
-
24. The method of claim 21, wherein the filters are capable of detecting spatially local correlations present in the part patches.
-
25. The method of claim 21, further comprising:
for the at least two selected convolutional neural networks among the multiple convolutional neural networks, applying a normalization operation to the part patch after one of the multiple stages of convolution operations has been applied to the part patch.
-
26. The method of claim 21, further comprising:
for the at least two selected convolutional neural networks among the multiple convolutional neural networks, applying a max-pooling operation to the part patch after one of the multiple stages of convolution operations has been applied to the part patch.
-
27. The method of claim 21, further comprising:
resizing the part patches to a common resolution, where the common resolution is a required resolution for inputs of the convolutional neural networks.
-
28. The method of claim 21, further comprising:
breaking down the part patches into three layers based on the red, green and blue channels of the part patches.
-
29. The method of claim 21, further comprising:
presenting, through an output interface of the computing device, a signal indicating whether the human attribute exists in the digital image.
-
30. The method of claim 21, further comprising:
-
locating a whole-body portion from the digital image, wherein the whole-body portion covers an entire human body depicted in the digital image; feeding the whole-body portion into a deep neural network to generate a set of whole-body feature data; and incorporating the set of whole-body feature data into the set of concatenated feature data.
-
-
31. The method of claim 21, wherein the result provided by the classification engine comprises a prediction score indicating the likelihood of the human attribute existing in the digital image.
-
32. The method of claim 21, wherein the human attribute comprises gender, age, race, hair or clothing.
-
33. The method of claim 21, wherein the classification engine comprises a linear support vector machine that is trained using training data associated with the human attribute.
Specification