×

Uniform resource locator scoring for targeted web crawling

  • US 20080104113A1
  • Filed: 10/26/2006
  • Published: 05/01/2008
  • Est. Priority Date: 10/26/2006
  • Status: Active Grant
First Claim
Patent Images

1. A targeted web crawling method comprising:

  • analyzing a downloaded web page that links to an outlinked web page having a URL and belonging to a sub-group of web pages;

    generating a domain density score in response to the sub-group, where the domain density score indicates relevance of the URL to a desired web page type;

    generating an anchor text score in response to anchor text of the URL, where the anchor text score indicates probability that the outlinked web page is of the desired web page type;

    generating a URL string score in response to characters of the URL, where the URL string score indicates probability that the outlinked web page is of the desired web page type;

    generating a category need score in response to characteristics of the downloaded web page, where the category need score is influenced by a current distribution of a plurality of categories associated with the desired web page type; and

    calculating a downloading priority for the URL in response to the domain density score, the anchor text score, the URL string score, and the category need score.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×