Document clustering method and system
7 Assignments
0 Petitions
Accused Products
Abstract
Document clustering method and system utilizing both th logbased clustering method and the content-based clustering method are disclosed. The method includes the steps of generating log-based document clusters and combining vectors from the log-based document clusters with individual document clusters for content-based clustering analysis. The log-based document clusters are generated by accessing the retrieval session log, clustering the retrieval sessions, and combining the documents opened during each of the sessions of session clusters.
21 Citations
35 Claims
-
1-16. -16. cancel.
-
17. A method for clustering documents, including generating clusters with user perspective comprising:
-
receiving session logs;
performing log-based clustering on the session logs to generate session clusters;
representing each session cluster as a log-based document suitable for content based clustering;
receiving a plurality of documents that includes a first document that was accessed in one session and a second document that was not accessed in the sessions;
replacing the first document with a log-based document associated with the session cluster that includes the first document; and
performing content based clustering on at least the first document and the second document to generate clusters with user perspective. - View Dependent Claims (18, 19, 20, 21, 22)
-
-
23. A method for clustering documents comprising:
-
generating a hybrid matrix of vectors comprising a first vector representing a first document and a second vector representing a log-based document cluster; and
clustering the documents using the hybrid matrix. - View Dependent Claims (24, 25, 26, 27)
-
-
28. A system for clustering documents, the system comprising;
-
a storage for storing retrieval session logs; and
a processor connected to the storage, configured to cluster the retrieval sessions into session clusters, generate, for each session cluster, a log-based document cluster, generate a log-based document cluster vector for each of the log-based document clusters, generate an individual document vector for each document not opened during any retrieval session, cluster the documents using the log-based document cluster vectors and individual document vectors.
-
-
31. A data processing system having session logs and documents, the system comprising:
-
a processor for executing program instructions; and
a media readable by the processor having a document clustering module having a plurality of instructions, that when executed by the processor, performs log-based clustering on the session logs to generate session clusters, converts the session clusters into a form suitable for content-based clusters, performs content-based clustering on the documents and session clusters in a form suitable for content-based clustering to generate document clusters with users'"'"' perspective. - View Dependent Claims (34)
-
-
35. A machine readable memory device encoded with a data structure for clustering documents, the data structure having entries for a log-based document cluster vector generated from a log-based document cluster, and an individual document vector corresponding to a vector generated from a first document, the first document not belonging to any log based document cluster.
Specification