×

System And Methods For Clustering Large Database of Documents

  • US 20090043797A1
  • Filed: 07/28/2008
  • Published: 02/12/2009
  • Est. Priority Date: 07/27/2007
  • Status: Abandoned Application
First Claim
Patent Images

1. A method of organizing a plurality of documents for later;

  • access, and retrieval within a computerized;

    system, wherein the plurality of documents are contained within a dataset and wherein a class of documents contained in the dataset include one or more citations to one or more other documents, comprising the steps of;

    creating a set of fingerprints for each respective document in the class, wherein each fingerprint comprises one or more citations contained in the respective document;

    creating a plurality of clusters for the dataset based on the sets of fingerprints for the documents in the class;

    assigning each respective document in the class to zero or more of the clusters based on the set of fingerprints for said respective document and wherein each respective cluster has documents assigned thereto based on a statistical similarity between the sets of fingerprints of said assigned documents;

    for each remaining document in the dataset that has not yet been assigned to at least one cluster, assigning each said remaining document to one or more of the clusters based on a natural language processing comparison of each said remaining document with documents already assigned to each respective cluster;

    creating a descriptive label for each respective cluster based on key terms contained in the documents assigned to the respective cluster; and

    presenting one or more of the labeled clusters to a user of the computerized system.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×