×

Method and apparatus for focused crawling

  • US 7,080,073 B1
  • Filed: 06/24/2002
  • Issued: 07/18/2006
  • Est. Priority Date: 08/18/2000
  • Status: Expired due to Term
First Claim
Patent Images

1. A method of focused crawling, comprising:

  • accessing a query input;

    crawling a plurality of documents continually, the documents including links to each other, and the crawling at least partly guided by a crawl metric, wherein the crawl metric quantifies priority for crawling links emanating from a certain document within the crawling, the crawl metric at least partly determined by a first mechanism, the first mechanism including a first combination, the first combination including a first plurality of one or more procedures, the first plurality of one or more procedures including evaluating relevance of documents using a link structure of the crawled documents, wherein the evaluating relevance of documents using a link structure of the crawled documents is performed repeatedly and continually, and wherein the evaluating relevance of documents using a link structure of the crawled documents includes;

    accessing a first plurality of documents from a database of a plurality of received documents, the plurality of received documents including crawled documents, the first plurality of documents to be ranked,generating a graph of the first plurality of documents,assigning weights to a plurality of nodes of the graph, wherein nodes of the graph represent the documents and edges represent links between the documents,finding an assignment of weights to one or more nodes of the graph, by propagating weights through the graph, the assignment of weight to a node based at least in part on calculating a weighted sum of weights propagated from neighboring nodes, andgenerating a ranked list of at least the first plurality of documents, the ranked list at least partly generated from the graph; and

    returning target documents, the target documents being relevant to the query input, the target documents found from the plurality of crawled documents, the target documents returned at least partly based on a search metric, the search metric quantifying relevance or importance of a document to the query input, the search metric at least partly determined by a second mechanism, the second mechanism including a second combination, the second combination being different from the first combination, the second combination including a second plurality of one or more procedures, the second plurality of procedures including evaluating relevance of documents using a template, the template including a plurality of one or more template portions, at least one of the template portions including a second plurality of one or more hierarchical levels.

View all claims
  • 10 Assignments
Timeline View
Assignment View
    ×
    ×