Method for ranking documents in a hyperlinked environment using connectivity and selective content analysis
First Claim
1. A computerized method for ranking a set of documents, the set of documents including links connecting the documents to each other, comprising the step of:
- constructing a graph in a memory, the graph including nodes and directed edges, each node representing one of the documents, and the directed edges representing the links connecting the documents;
selecting a first subset of the documents from the set of documents to form a topic;
assigning a relevance weight to each node, the relevance weight of each node based on the similarity of the document represented by the node to the topic;
selecting a second subset of the documents from the set of documents to test for relevance to the topic;
pruning a particular node from the second subset if the associated relevance weight of the particular node is less than a predetermined threshold to form a pruned graph representing a third subset of the documents; and
ranking the subset of documents represented by the nodes of the pruned graph according to a connectivity based ranking scheme.
10 Assignments
0 Petitions
Accused Products
Abstract
In a computerized method, a set of documents is ranked according to their content and their connectivity by using topic distillation. The documents include links that connect the documents to each other, either directly, or indirectly. A graph is constructed in a memory of a computer system. In the graph, nodes represent the documents, and directed edges represent the links. Based on the number of links connecting the various nodes, a subset of documents is selected to form a topic. A second subset of the documents is chosen based on the number of directed edges connecting the nodes. Nodes in the second subset are compared with the topic to determine similarity to the topic, and a relevance weight is correspondingly assigned to each node. Nodes in the second subset having a relevance weight less than a predetermined threshold are pruned from the graph. The documents represented by the remaining nodes in the graph are ranked by connectivity based ranking scheme.
-
Citations
9 Claims
-
1. A computerized method for ranking a set of documents, the set of documents including links connecting the documents to each other, comprising the step of:
-
constructing a graph in a memory, the graph including nodes and directed edges, each node representing one of the documents, and the directed edges representing the links connecting the documents; selecting a first subset of the documents from the set of documents to form a topic; assigning a relevance weight to each node, the relevance weight of each node based on the similarity of the document represented by the node to the topic; selecting a second subset of the documents from the set of documents to test for relevance to the topic; pruning a particular node from the second subset if the associated relevance weight of the particular node is less than a predetermined threshold to form a pruned graph representing a third subset of the documents; and ranking the subset of documents represented by the nodes of the pruned graph according to a connectivity based ranking scheme. - View Dependent Claims (2, 3, 4, 7, 9)
-
-
5. A computerized method for ranking a set of documents, the set of documents including links connecting the documents to each other, comprising the step of:
-
constructing a graph in a memory, the graph including the nodes and directed edges, each node representing one of the documents, and the directed edges representing the links connecting the documents, each document being identified by a string and the query includes a set of terms, and selecting the first subset of documents according to the number of directed edges connecting a particular node, and the number of terms that appear as unique sub-strings in the string identifying the document represented by the particular node; selecting a first subset of the documents from the set of documents to form a topic; assigning a relevance weight to each node, the relevance weight of each node based on the similarity of the document represented by the node to the topic; selecting a second subset of the documents from the set of documents to test for relevance to the topic; pruning a particular node from the second subset if the associated relevance weight of the particular node is less than a predetermined threshold to form a pruned graph representing a third subset of the documents; ranking the subset of documents represented by the nodes of the pruned graph according to a connectivity based ranking scheme; and
,scoring the particular node according to the number of directed edges to the particular node plus two times the number of terms that appears as unique sub-strings plus one if the particular node includes a directed edge to another node. - View Dependent Claims (6)
-
-
8. A computerized method for ranking a set of documents, the set of documents including links connecting the documents to each other, comprising the step of:
-
constructing a graph in a memory, the graph including the nodes and directed edges, each node representing one of the documents, and the directed edges representing the links connecting the documents; selecting a first subset of the documents from the set of documents to form a topic; assigning a relevance weight to each node, the relevance weight of each node based on the similarity of the document represented by the node to the topic; selecting a second subset of the documents from the set of documents to test for relevance to the topic; pruning a particular node from the second subset if the associated relevance weight of the particular node is less than a predetermined threshold to form a pruned graph representing a third subset of the documents; ranking the subset of documents represented by the nodes of the pruned graph according to a connectivity based ranking scheme testing a sub-set of the nodes with a high edge-score for relevance, where the edge-score is based on the number of directed edges connected to a particular node; and setting the edge-score of a particular node to four times the number of directed edges to the particular node plus the number of directed edges to other nodes.
-
Specification