×

SYSTEM AND METHOD FOR PRIORITIZING WEBSITES DURING A WEBCRAWLING PROCESS

  • US 20080256046A1
  • Filed: 06/23/2008
  • Published: 10/16/2008
  • Est. Priority Date: 03/29/2006
  • Status: Active Grant
First Claim
Patent Images

1. A prioritization method, comprising:

  • extracting, by a web crawler in a computing system, a set of candidate web pages to be crawled, wherein said computing system comprises a memory unit, and wherein said memory unit comprises said web crawler, said set of candidate web pages, an online analysis software application, an offline analysis software application, and a website score database;

    associating, by said online analysis software application, each web page in said set of candidate web pages with a website in a computer network;

    determining online, by said online analysis software application, if a first website score for said website, is in said website score database;

    associating, by said online analysis software application, said first website score for said website with associated web pages in said set of candidate web pages, if said first website score exists in said website score database;

    prioritizing, said set of candidate web pages with respect to an associated website score for each web page in said candidate set of web pages;

    retrieving, by said web crawler, content from said set of candidate web pages using said prioritizing;

    extracting, by said online analysis software application, hyperlinks from said content;

    storing said hyperlinks in said memory unit.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×