DATA CLUSTERING
First Claim
1. A computer implemented method, comprising:
- receiving a plurality of documents, each of the plurality of documents represented by a vector of words and associated with a point in time;
dividing the received plurality of documents into first time slices using a first time interval to form a plurality of consecutive sets of documents;
sub-dividing each of the plurality of consecutive sets of documents into second time slices using respective second time intervals to form one or more subsets of documents;
identifying a plurality of topics in each of the plurality of consecutive sets of documents and the one or more subsets of documents, each of the plurality of topics represented by a set of most relevant topic keywords;
clustering each of the plurality of consecutive sets of documents and the one or more subsets of documents in accordance with each of the identified plurality of topics;
comparing each of the identified plurality of topics with respect to each of the plurality of consecutive sets of documents and the one or more subsets of documents to detect patterns of changes in the set of most relevant topic keywords over time; and
redefining each of the clustered plurality of consecutive sets of documents and the one or more subsets of documents to form homogenous clusters based on the detected patterns.
1 Assignment
0 Petitions
Accused Products
Abstract
A system, method and computer program product performs data analysis and clustering. A plurality of data objects are received, each represented by a vector of features and associated with a point in time. The plurality of data objects is divided into first time slices to form a plurality of consecutive sets of data objects. Each set of data objects is sub-divided into one or more second time slices so as to form one or more subsets of data objects. The data objects in each set and subset of data objects are processed to derive clusters of data objects according to similarity of features. The clusters of data objects from different sets and subsets of data objects are used to detect changes in the relevance of cluster features over time.
-
Citations
24 Claims
-
1. A computer implemented method, comprising:
-
receiving a plurality of documents, each of the plurality of documents represented by a vector of words and associated with a point in time; dividing the received plurality of documents into first time slices using a first time interval to form a plurality of consecutive sets of documents; sub-dividing each of the plurality of consecutive sets of documents into second time slices using respective second time intervals to form one or more subsets of documents; identifying a plurality of topics in each of the plurality of consecutive sets of documents and the one or more subsets of documents, each of the plurality of topics represented by a set of most relevant topic keywords; clustering each of the plurality of consecutive sets of documents and the one or more subsets of documents in accordance with each of the identified plurality of topics; comparing each of the identified plurality of topics with respect to each of the plurality of consecutive sets of documents and the one or more subsets of documents to detect patterns of changes in the set of most relevant topic keywords over time; and redefining each of the clustered plurality of consecutive sets of documents and the one or more subsets of documents to form homogenous clusters based on the detected patterns. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer system, comprising:
-
one or more computer processors, one or more computer-readable storage media, and program instructions stored on one or more of the computer-readable storage media for execution by at least one of the one or more processors, the program instructions, when executed by the at least one of the one or more processors, causing the computer system to perform a method comprising; receiving a plurality of documents, each of the plurality of documents represented by a vector of words and associated with a point in time; dividing the received plurality of documents into first time slices using a first time interval to form a plurality of consecutive sets of documents; sub-dividing each of the plurality of consecutive sets of documents into second time slices using respective second time intervals to form one or more subsets of documents; identifying a plurality of topics in each of the plurality of consecutive sets of documents and the one or more subsets of documents, each of the plurality of topics represented by a set of most relevant topic keywords; clustering each of the plurality of consecutive sets of documents and the one or more subsets of documents in accordance with each of the identified plurality of topics; comparing each of the identified plurality of topics with respect to each of the plurality of consecutive sets of documents and the one or more subsets of documents to detect patterns of changes in the set of most relevant topic keywords over time; and redefining each of the clustered plurality of consecutive sets of documents and the one or more subsets of documents to form homogenous clusters based on the detected patterns. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A computer program product for controlling access to a secure resource, the computer program product comprising:
One or more computer-readable storage devices and program instructions stored on at least one of the one or more tangible storage devices, the program instructions comprising; program instructions to receive a plurality of documents, each of the plurality of documents represented by a vector of words and associated with a point in time; program instructions to divide the received plurality of documents into first time slices using a first time interval to form a plurality of consecutive sets of documents; program instructions to sub-divide each of the plurality of consecutive sets of documents into second time slices using respective second time intervals to form one or more subsets of documents; program instructions to identify a plurality of topics in each of the plurality of consecutive sets of documents and the one or more subsets of documents, each of the plurality of topics represented by a set of most relevant topic keywords; program instructions to cluster each of the plurality of consecutive sets of documents and the one or more subsets of documents in accordance with each of the identified plurality of topics; program instructions to compare each of the identified plurality of topics with respect to each of the plurality of consecutive sets of documents and the one or more subsets of documents to detect patterns of changes in the set of most relevant topic keywords over time; and program instructions to redefine each of the clustered plurality of consecutive sets of documents and the one or more subsets of documents to form homogenous clusters based on the detected patterns.
-
23. A computer implemented method, comprising:
-
receiving a plurality of data objects, each of the plurality of data objects represented by a vector of words and associated with a point in time; dividing the received plurality of data objects into first time slices using a first time interval to form a plurality of consecutive sets of data objects; sub-dividing each of the plurality of consecutive sets of data objects into second time slices using respective second time intervals to form one or more subsets of data objects; processing the plurality of data objects in each of the plurality of consecutive sets of data objects and the one or more subsets of data objects to derive clusters of the data objects according to similarity of features, wherein each of the derived clusters is represented by a most relevant set of cluster features; identifying a plurality of cluster features in each of the plurality of consecutive sets of data objects and the one or more subsets of cluster features, each of the plurality of topics represented by a set of most relevant cluster features; and redefining the derived clusters of data objects to form homogenous clusters based on the analysis. - View Dependent Claims (24)
-
Specification