×

System and method for focussed web crawling

  • US 6,418,433 B1
  • Filed: 01/28/1999
  • Issued: 07/09/2002
  • Est. Priority Date: 01/28/1999
  • Status: Expired due to Fees
First Claim
Patent Images

1. A general purpose computer including a data storage device including a computer usable medium having computer readable code means for focussed crawling of a wide area computer network such as the World Wide Web, comprising:

  • computer readable code means for receiving a seed set of Web pages in a crawl database, the seed set being selected based at least on relevance to at least one topic;

    computer readable code means for identifying outlink Web pages from one or more Web pages in the crawl database;

    computer readable code means for evaluating one or more outlink Web pages for relevance to the topic; and

    computer readable code means for causing outlinks only of Web pages evaluated as being relevant to the topic to be stored in the crawl database.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×