Method and apparatus for focused crawling
First Claim
Patent Images
1. A method of focused crawling, comprising:
- accessing a query input;
crawling a plurality of documents continually, the documents including links to each other, and the crawling at least partly guided by a crawl metric, wherein the crawl metric quantifies priority for crawling links emanating from a certain document within the crawling, the crawl metric at least partly determined by a first mechanism, the first mechanism including a first combination, the first combination including a first plurality of one or more procedures, the first plurality of one or more procedures including evaluating relevance of documents using a link structure of the crawled documents, wherein the evaluating relevance of documents using a link structure of the crawled documents is performed repeatedly and continually, and wherein the evaluating relevance of documents using a link structure of the crawled documents includes;
accessing a first plurality of documents from a database of a plurality of received documents, the plurality of received documents including crawled documents, the first plurality of documents to be ranked,generating a graph of the first plurality of documents,assigning weights to a plurality of nodes of the graph, wherein nodes of the graph represent the documents and edges represent links between the documents,finding an assignment of weights to one or more nodes of the graph, by propagating weights through the graph, the assignment of weight to a node based at least in part on calculating a weighted sum of weights propagated from neighboring nodes, andgenerating a ranked list of at least the first plurality of documents, the ranked list at least partly generated from the graph; and
returning target documents, the target documents being relevant to the query input, the target documents found from the plurality of crawled documents, the target documents returned at least partly based on a search metric, the search metric quantifying relevance or importance of a document to the query input, the search metric at least partly determined by a second mechanism, the second mechanism including a second combination, the second combination being different from the first combination, the second combination including a second plurality of one or more procedures, the second plurality of procedures including evaluating relevance of documents using a template, the template including a plurality of one or more template portions, at least one of the template portions including a second plurality of one or more hierarchical levels.
10 Assignments
0 Petitions
Accused Products
Abstract
The present invention pertains to the field of computer software. More specifically, the present invention relates to dynamic discovery of documents or information through a focused crawler or search engine.
463 Citations
19 Claims
-
1. A method of focused crawling, comprising:
-
accessing a query input; crawling a plurality of documents continually, the documents including links to each other, and the crawling at least partly guided by a crawl metric, wherein the crawl metric quantifies priority for crawling links emanating from a certain document within the crawling, the crawl metric at least partly determined by a first mechanism, the first mechanism including a first combination, the first combination including a first plurality of one or more procedures, the first plurality of one or more procedures including evaluating relevance of documents using a link structure of the crawled documents, wherein the evaluating relevance of documents using a link structure of the crawled documents is performed repeatedly and continually, and wherein the evaluating relevance of documents using a link structure of the crawled documents includes; accessing a first plurality of documents from a database of a plurality of received documents, the plurality of received documents including crawled documents, the first plurality of documents to be ranked, generating a graph of the first plurality of documents, assigning weights to a plurality of nodes of the graph, wherein nodes of the graph represent the documents and edges represent links between the documents, finding an assignment of weights to one or more nodes of the graph, by propagating weights through the graph, the assignment of weight to a node based at least in part on calculating a weighted sum of weights propagated from neighboring nodes, and generating a ranked list of at least the first plurality of documents, the ranked list at least partly generated from the graph; and returning target documents, the target documents being relevant to the query input, the target documents found from the plurality of crawled documents, the target documents returned at least partly based on a search metric, the search metric quantifying relevance or importance of a document to the query input, the search metric at least partly determined by a second mechanism, the second mechanism including a second combination, the second combination being different from the first combination, the second combination including a second plurality of one or more procedures, the second plurality of procedures including evaluating relevance of documents using a template, the template including a plurality of one or more template portions, at least one of the template portions including a second plurality of one or more hierarchical levels. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method, comprising:
-
performing a plurality of focused crawls, wherein each of the plurality of focused crawls comprises; accessing a query input; crawling a plurality of documents, the documents including links to each other, and the crawling at least partly guided by a crawl metric, the crawl metric at least partly determined by a first mechanism, the first mechanism including a first combination, the first combination including evaluating relevance of documents using a link structure of the crawled documents wherein the evaluating relevance of documents using a link structure of the crawled documents is performed repeatedly and continually, and wherein the evaluating relevance of documents using a link structure of the crawled documents includes; accessing a first plurality of documents from a database of a plurality of received documents, the plurality of received documents including crawled documents, the first plurality of documents to be ranked, generating a graph of the first plurality of documents, assigning weights to a plurality of nodes of the graph wherein nodes of the graph represent the documents and edges represent links between the documents, finding an assignment of weights to one or more nodes of the graph, by propagating weights through the graph, the assignment of weight to a node based at least in part on calculating a weighted sum of weights propagated from neighboring nodes, and generating a ranked list of at least the first plurality of documents, the ranked list at least partly generated from the graph; and returning target documents, the target documents being relevant to the query input, the target documents found from the plurality of crawled documents, the target documents returned at least partly based on a search metric, the search metric quantifying relevance or importance of a document to the query input, the search metric at least partly determined by a second mechanism, the second mechanism including a second combination, the second combination being different from the first combination, the second combination including one or more of
1) evaluating relevance of documents using logical expressions of keywords and phrases,
2) evaluating relevance of documents using a template including a plurality of one or more template portions, at least one of the template portions including a plurality of one or more hierarchical levels,
3) evaluating relevance of documents using a link structure of the crawled documents, and
4) evaluating relevance based on freshness of documents,wherein the method is performed on at least one of
1)a first processor and
2) one or more of a first plurality of one or more processors. - View Dependent Claims (17, 18, 19)
-
Specification