×

System and method for efficient filtering of data set addresses in a web crawler

  • US 6,952,730 B1
  • Filed: 06/30/2000
  • Issued: 10/04/2005
  • Est. Priority Date: 06/30/2000
  • Status: Expired due to Term
First Claim
Patent Images

1. A method of downloading data sets from among a plurality of host computers, comprising the steps of:

  • (a) storing representations of data set addresses in a set of data structures, including a buffer and a first disk file, wherein the representations of data set addresses stored in the first disk file are ordered;

    (b) downloading at least one data set that includes addresses of one or more referred data sets;

    (c) identifying the addresses of the one or more referred data sets;

    (d) for each identified address;

    (d1) generating a representation of the identified address;

    (d2) determining whether the representation is stored in the buffer without determining whether the representation is stored in the first disk file, and when this determination is negative, storing the representation in the buffer; and

    (e) when the buffer reaches a predefined full condition;

    (e1) ordering the contents of the buffer according to the representations;

    (e2) performing an ordered merge of the contents of the buffer into the contents of the first disk file; and

    (e3) preventing duplication of any of the representations of data set addresses stored in the first disk file after the ordered merge.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×