×

Optimized web domains classification based on progressive crawling with clustering

  • US 8,972,376 B1
  • Filed: 01/02/2013
  • Issued: 03/03/2015
  • Est. Priority Date: 01/02/2013
  • Status: Active Grant
First Claim
Patent Images

1. A system for optimized web domains classification based on progressive crawling with clustering, comprising:

  • a processor configured to;

    crawl a domain to collect data for a subset of pages of a corpus of content associated with the domain;

    classify each of the crawled pages into one or more category clusters, wherein the category clusters represent a content categorization of the corpus of content associated with the domain, and wherein the classifying of the each of the crawled pages into the one or more category clusters comprises;

    determine a category for the each of the crawled pages in the domain;

    group more than one page having the same category into a first cluster;

    determine whether a number of the more than one page of the first cluster exceeds a first threshold; and

    in the event that the number of the more than one page of the first cluster does not exceed the first threshold, select a new page within the domain to crawl and classify; and

    determine which of the one or more category clusters to publish for the domain; and

    a memory coupled to the processor and configured to provide the processor with instructions.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×