Document clustering method and system
First Claim
Patent Images
1. A method for clustering documents, including generating clusters with user perspective comprising:
- receiving retrieval session logs;
performing log-based clustering on the session logs to generate session clusters;
representing each of the session clusters as a log-based document suitable for content based clustering;
receiving a plurality of documents that includes a first document that was accessed in one session and a second document that was not accessed in any of the sessions;
replacing the first document with one of the log-based documents, wherein said one of the log-based documents is associated with the session cluster that includes the first documents; and
performing content based clustering on at least one of the log-based document and the second document to generate clusters with user perspective.
7 Assignments
0 Petitions
Accused Products
Abstract
Document clustering method and system utilizing both the log-based clustering method and the content-based clustering method are disclosed. The method includes the steps of generating log-based document clusters and combining vectors from the log-based document clusters with individual document clusters for content-based clustering analysis. The log-based document clusters are generated by accessing the retrieval session log, clustering the retrieval sessions, and combining the documents opened during each of the sessions of session clusters.
-
Citations
15 Claims
-
1. A method for clustering documents, including generating clusters with user perspective comprising:
-
receiving retrieval session logs; performing log-based clustering on the session logs to generate session clusters; representing each of the session clusters as a log-based document suitable for content based clustering; receiving a plurality of documents that includes a first document that was accessed in one session and a second document that was not accessed in any of the sessions; replacing the first document with one of the log-based documents, wherein said one of the log-based documents is associated with the session cluster that includes the first documents; and performing content based clustering on at least one of the log-based document and the second document to generate clusters with user perspective. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for clustering documents comprising:
-
generating a hybrid matrix of vectors comprising a first vector representing a first document and a second vector representing a log-based document cluster document; and clustering the documents using the hybrid matrix, wherein the hybrid matrix comprises; accessing retrieval session logs; clustering retrieval sessions into session clusters; generating, a log-based document cluster for each session cluster by combining all documents opened during any retrieval session of the session cluster; generating a log-based document cluster vector for each of the log-based document clusters; replacing each document in the log-based document cluster with the log-based document cluster vector; generating an individual document vector for each document not opened during any retrieval session; and combining the log-based document cluster vector and the individual document cluster vector. - View Dependent Claims (8)
-
-
9. A system for clustering documents, the system comprising:
-
a storage for storing retrieval session logs; and a processor connected to the storage, configured to cluster the retrieval sessions into session clusters, generate, for each session cluster, a log-based document cluster, generate a log-based document cluster vector for each of the log-based document cluster, generate an individual document vector for each document not opened during any retrieval session, cluster the documents using the log-based document cluster vectors and individual document vectors. - View Dependent Claims (10, 11)
-
-
12. A data processing system having session logs and documents, the system comprising:
-
a processor for executing program instructions; and a media readable by the processor having a document clustering module having a plurality of instructions, that when executed by the processor, performs log-based clustering on the session logs to generate session clusters, converts the session clusters into a form suitable for content-based clusters, performs content-based clustering on the documents and session clusters in a form suitable for content-based clustering to generate document clusters with users'"'"' perspective. - View Dependent Claims (13, 14, 15)
-
Specification