×

SYSTEM AND METHOD FOR FOCUSED RE-CRAWLING OF WEB SITES

  • US 20080168041A1
  • Filed: 03/25/2008
  • Published: 07/10/2008
  • Est. Priority Date: 12/21/2005
  • Status: Active Grant
First Claim
Patent Images

1. A method of crawling the Web, said method comprising:

  • crawling Web pages on the Web starting from a given set of seed Universal Resource Locators (URLs);

    partitioning crawled Web pages into sets of relevant and irrelevant pages;

    discovering from said sets of relevant and irrelevant pages a set of exclusion and inclusion patterns; and

    restricting subsequent crawling of the Web through said set of exclusion and inclusion patterns.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×