×

Clustering documents using citation patterns

  • US 8,612,411 B1
  • Filed: 12/31/2003
  • Issued: 12/17/2013
  • Est. Priority Date: 12/31/2003
  • Status: Active Grant
First Claim
Patent Images

1. A method for clustering documents using one or more processors, the method comprising:

  • in each document in a collection of documents, locating citations, the citations including references to other documents;

    for each pair of one or more pairs of documents in the collection of documents, performing actions comprising;

    comparing the pair of documents based on overlapping citations at a first document structure level to generate a first citation overlap level, each of the overlapping citations at the first citation overlap level including common references to a same other document, the common references occurring at the first document structure level in each document of the pair;

    comparing the pair of documents based on overlapping citations at a second document structure level to generate a second citation overlap level, each of the overlapping citations at the second citation overlap level including common references to a same other document, the common references occurring at the second document structure level in each document of the pair, where the second document structure level is more specific than the first document structure level; and

    combining the first citation overlap level and the second citation overlap level to generate a citation overlap score for the pair of documents;

    determining a plurality of clusters of related documents in the collection of documents according to the citation overlap scores;

    ranking the plurality of clusters including;

    generating a weighted cluster citation overlap score for each pair of documents in each cluster of the plurality of clusters, andpenalizing the weighted cluster citation overlap score for one or more clusters based on whether each cluster contains documents that contain only overlapping citations with other documents at the first document structure level;

    andproviding a listing of the plurality of clusters of related documents based on the ranking.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×