×

METHOD FOR ADAPTING A K-MEANS TEXT CLUSTERING TO EMERGING DATA

  • US 20080215314A1
  • Filed: 04/07/2008
  • Published: 09/04/2008
  • Est. Priority Date: 09/26/2000
  • Status: Active Grant
First Claim
Patent Images

1. A system for clustering documents in datasets comprising:

  • a storage having a first dataset and a second dataset;

    a cluster generator operative to cluster first documents in said first dataset and produce first document classes;

    a centroid seed generator operative to generate centroid seeds based on said first document classes;

    a dictionary generator adapted to generate a first dictionary of most common words in said first dataset; and

    a vector space model generator adapted to generate a first vector space model by counting, for each word in said first dictionary, a number of said first documents in which said word occurs,wherein said cluster generator clusters said documents in said first dataset based on said first vector space mode,wherein said cluster generator clusters second documents in said second dataset using said centroid seeds, such that said second dataset has a similar, based on said centroid seeds, clustering to that of said first dataset, andwherein said second dataset comprises a new, but related, based on said centroid seeds, dataset different than said first dataset.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×