Automatic classification of display ads using ad images and landing pages
First Claim
1. A method for classifying display ads automatically into a taxonomy of categories, the method comprising:
- using a processor device, causing a computer to perform steps of;
extracting first text features from an ad image of a display ad using optical character recognition (OCR) techniques;
identifying objects of interest from the ad image using object detection and recognition techniques in computer vision;
extracting second text features from a title, keywords, and content of a landing page (a web-page of an advertiser that a user is redirected to when clicking an ad) associated with the display ad;
generating bag-of-words ad features using the extracted first and second text features, as well as attributes of the advertiser;
using the generated bag-of-words ad features to categorize the display ad;
training statistical models using the generated bag-of-words ad features on a historical dataset of ads labeled by human editors; and
determining relevant categories of unlabeled ads using the trained statistical models to classify the display ads.
9 Assignments
0 Petitions
Accused Products
Abstract
A system and method for automatically classifying ads into a taxonomy of categories, the method including: extracting text features from ad images using OCR (optical character recognition) techniques; identifying objects of interest from ad images using object detection and recognition techniques in computer vision; extracting text features from the web-page of the advertiser to which the user is re-directed when clicking the ad; training statistical models using the extracted features mentioned above as well as advertiser attributes from a historical dataset of ads labeled by human editors; and determining the relevant categories of unlabeled ads using the trained models.
-
Citations
3 Claims
-
1. A method for classifying display ads automatically into a taxonomy of categories, the method comprising:
using a processor device, causing a computer to perform steps of; extracting first text features from an ad image of a display ad using optical character recognition (OCR) techniques; identifying objects of interest from the ad image using object detection and recognition techniques in computer vision; extracting second text features from a title, keywords, and content of a landing page (a web-page of an advertiser that a user is redirected to when clicking an ad) associated with the display ad; generating bag-of-words ad features using the extracted first and second text features, as well as attributes of the advertiser; using the generated bag-of-words ad features to categorize the display ad; training statistical models using the generated bag-of-words ad features on a historical dataset of ads labeled by human editors; and determining relevant categories of unlabeled ads using the trained statistical models to classify the display ads. - View Dependent Claims (2)
-
3. A system of display ad categorization, the system comprising:
-
a processor device; a storage device operably coupled with the processor device, said storage device comprising instructions that are executed by said processor device; wherein the instructions cause a computer to perform a method comprising steps of; reading an ad image for a display ad and a landing page associated with said display ad from the storage device; extracting first text features from the ad image using optical character recognition (OCR) techniques; executing object detection and recognition to identify objects of interest from the ad image; parsing the landing page to extract second text features from a title, keywords, and content of said landing page; generating bag-of-words ad features using the extracted first and second text features, as well as attributes of the advertiser; using the generated bag-of-words ad features to categorize the display ad; storing the extracted first and second text features from the ad image and the landing page in the storage device; training statistical models using the generated bag of words ad features on a historical dataset of ads labeled by human editors; and determining relevant categories of unlabeled ads using the trained statistical models to classify the display ads.
-
Specification