×

Link based clustering of hyperlinked documents

  • US 8,516,357 B1
  • Filed: 12/02/2010
  • Issued: 08/20/2013
  • Est. Priority Date: 08/12/1999
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method, comprising:

  • identifying, by a device, a set of documents;

    expanding, by the device, the set of documents to generate an expanded set of documents,where the expanded set of documents includes all of the documents in the set of documents and one or more additional documents, andwhere each additional document, of the one or more additional documents, links to a document in the set of documents or is linked to by a document in the set of documents;

    determining, by the device, a similarity measure for each pair of documents in the expanded set of documents,where, for a pair of documents, in the expanded set of documents, consisting of a first document and a second document, the similarity measure is determined based on;

    a quantity of documents in the expanded set of documents that contain both a forward link to the first document and a forward link to the second document,whether the first document contains a forward link to the second document, andwhether the second document contains a forward link to the first document; and

    clustering, by the device, the documents in the expanded set of documents into a plurality of clusters based on the similarity measures.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×