Classification of images as advertisement images or non-advertisement images of web pages
First Claim
1. A method in a computing device for identifying advertisement images of web pages, the method comprising:
- providing training images of web pages, each provided training image being referenced by a web page;
labeling the images as advertisement images or non-advertisement images;
generating a feature vector for each of the training images, the feature vector including visual layout features derived from the web page of the image and content features derived from content of the image, the content features being selected from the group consisting of number of different colors in the content of the image, an indication of whether the image has high contrast, and an indication of whether the image is a photograph;
training a binary classifier using the feature vectors and labels of the images; and
classifying an image as an advertisement image or non-advertisement image by generating a feature vector for the image and applying the trained binary classifier to the generated feature vector of the image.
2 Assignments
0 Petitions
Accused Products
Abstract
An advertisement image classification system trains a binary classifier to classify images as advertisement images or non-advertisement images and then uses the binary classifier to classify images of web pages as advertisement images or non-advertisement images. During a training phase, the classification system generates training data of feature vectors representing the images and labels indicating whether an image is an advertisement image or a non-advertisement image. The classification system trains a binary classifier to classify images using training data. During a classification phase, the classification system inputs a web page with an image and generates a feature vector for the image. The classification system then applies the trained binary classifier to the feature vector to generate a score indicating whether the image is an advertisement image or a non-advertisement image.
27 Citations
20 Claims
-
1. A method in a computing device for identifying advertisement images of web pages, the method comprising:
-
providing training images of web pages, each provided training image being referenced by a web page; labeling the images as advertisement images or non-advertisement images; generating a feature vector for each of the training images, the feature vector including visual layout features derived from the web page of the image and content features derived from content of the image, the content features being selected from the group consisting of number of different colors in the content of the image, an indication of whether the image has high contrast, and an indication of whether the image is a photograph; training a binary classifier using the feature vectors and labels of the images; and classifying an image as an advertisement image or non-advertisement image by generating a feature vector for the image and applying the trained binary classifier to the generated feature vector of the image. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-readable medium for generating a binary classifier for classifying images of web pages as advertisement images or non-advertisement images, by a method comprising:
-
providing training web pages; identifying the images of the training web pages; receiving labels for the images indicating whether an image is an advertisement image or non-advertisement image; generating a feature vector for each of the identified images, the feature vector including visual layout features of the image on the web page and content features derived from content of the image by accessing the content of the image, the content features being selected from the group consisting of aspect ratio of the image, image format, an indication of whether the image is a photograph or a graphic, size of the image, number of different colors of the image, percentage of gray area of the image, and an indication of whether the image has high contrast; and training a binary classifier using the feature vectors and labels of the images wherein the training identifies weights of the features for use in classifying images. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A computing device for identifying features of images of web pages for use in classifying images as advertisement images or non-advertisement images, comprising:
-
a component that retrieves images of web pages and stores in a training data store feature vectors for the retrieved images of the web pages, the retrieved images being labeled as advertisement images or non-advertisement images, the feature vectors including candidate features that include visual layout features derived from the web page of the image and content features derived from content of the retrieved images by accessing the content of the retrieved images, the content features being selected from the group consisting of aspect ratio of the image, image format, an indication of whether the image is a photograph or a graphic, size of the image, number of different colors of the image, percentage of gray area of the image, and an indication of whether the image has high contrast; a component that trains a binary classifier using the feature vectors with candidate features and labels of the training data store; and a component that selects as features for use in classifying images those candidate features whose weights indicate they are effective at distinguishing advertisement images from non-advertisement images. - View Dependent Claims (18, 19, 20)
-
Specification