×

Methods and apparatus for intelligent crawling on the world wide web

  • US 8,060,816 B1
  • Filed: 10/31/2000
  • Issued: 11/15/2011
  • Est. Priority Date: 10/31/2000
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-based method of performing document retrieval in accordance with an information network, the method comprising the steps of:

  • initially retrieving one or more documents from the information network that satisfy a user-defined predicate, wherein the initial document retrieval operation is performed without assuming a specific model of a linkage structure such that the initial document retrieval operation retrieves the one or more documents without assuming that a relationship exists between a feature of a first one of the one or more documents and a feature of at least another one of the one or more documents that links to the first one;

    collecting at least a set of aggregate statistical information and a set of predicate-specific statistical information about the one or more retrieved documents as the one or more retrieved documents are analyzed; and

    using the collected statistical information to automatically determine further document retrieval operations to be performed in accordance with the information network, wherein the statistical information using step further comprises learning a linkage structure from at least a portion of the collected statistical information with each successive document retrieval operation such that the learned linkage structure is available for use in performing subsequent document retrieval operations requested by a user.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×