Link based clustering of hyperlinked documents
First Claim
Patent Images
1. A computer-implemented method, comprising:
- identifying, by a device, a set of documents;
expanding, by the device, the set of documents to generate an expanded set of documents,where the expanded set of documents includes all of the documents in the set of documents and one or more additional documents, andwhere each additional document, of the one or more additional documents, links to a document in the set of documents or is linked to by a document in the set of documents;
determining, by the device, a similarity measure for each pair of documents in the expanded set of documents,where, for a pair of documents, in the expanded set of documents, consisting of a first document and a second document, the similarity measure is determined based on;
a quantity of documents in the expanded set of documents that contain both a forward link to the first document and a forward link to the second document,whether the first document contains a forward link to the second document, andwhether the second document contains a forward link to the first document; and
clustering, by the device, the documents in the expanded set of documents into a plurality of clusters based on the similarity measures.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques for grouping hyperlinked documents are provided. Links near or in the neighborhood of the hyperlinked documents are analyzed in order to group the hyperlinked documents by topic. For example, links that are search results can be grouped by identifying other hyperlinked documents that have multiple forward links to the search results. The search results can then be grouped according to the forward links of the other hyperlinked documents.
37 Citations
20 Claims
-
1. A computer-implemented method, comprising:
-
identifying, by a device, a set of documents; expanding, by the device, the set of documents to generate an expanded set of documents, where the expanded set of documents includes all of the documents in the set of documents and one or more additional documents, and where each additional document, of the one or more additional documents, links to a document in the set of documents or is linked to by a document in the set of documents; determining, by the device, a similarity measure for each pair of documents in the expanded set of documents, where, for a pair of documents, in the expanded set of documents, consisting of a first document and a second document, the similarity measure is determined based on; a quantity of documents in the expanded set of documents that contain both a forward link to the first document and a forward link to the second document, whether the first document contains a forward link to the second document, and whether the second document contains a forward link to the first document; and clustering, by the device, the documents in the expanded set of documents into a plurality of clusters based on the similarity measures. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system, comprising:
-
a memory to store instructions; and a processor to execute the instructions to; identify a plurality of documents; expand the identified plurality of documents to generate an expanded plurality of documents, where the expanded plurality of documents includes an entirety of documents in the identified plurality of documents and one or more additional documents, and where each additional document, of the one or more additional documents, links to a document in the identified plurality of documents or is linked to by a document in the identified plurality of documents; determine a similarity measure for each pair of documents in the expanded plurality of documents, where, for a pair of documents, in the expanded plurality of documents, that includes a first document and a second document, the similarity measure is determined based on a quantity of documents, in the expanded plurality of documents, that contain both a forward link to the first document and a forward link to the second document, whether the first document contains a forward link to the second document, and whether the second document contains a forward link to the first document; and group, based on the similarity measures, documents, included in the expanded plurality of documents, into clusters to form clustered documents. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A non-transitory computer storage medium storing instructions, the instructions comprising:
-
one or more instructions which, when executed by one or more processors, cause the one or more processors to identify a plurality of documents; one or more instructions which, when executed by the one or more processors, cause the one or more processors to expand the identified plurality of documents to generate an expanded plurality of documents, where the expanded plurality of documents includes an entirety of documents in the identified plurality of documents and one or more additional documents, and where each additional document, of the one or more additional documents, links to a document in the identified plurality of documents or is linked to by a document in the identified plurality of documents; one or more instructions which, when executed by the one or more processors, cause the one or more processors to determine a similarity measure for each pair of documents in the expanded plurality of documents, where, for a pair of documents, in the expanded plurality of documents, that includes a first document and a second document, the similarity measure is determined based on; whether the first document contains a forward link to the second document, whether the second document contains a forward link to the first document, and a quantity of documents, in the expanded plurality of documents, that contain both a forward link to the first document and a forward link to the second document; and one or more instructions which, when executed by the one or more processors, cause the one or more processors to group, based on the similarity measures, documents, included in the expanded plurality of documents, into clusters to form clustered documents. - View Dependent Claims (17, 18, 19, 20)
-
Specification