System and method for detecting clusters of information
First Claim
1. A method of analyzing information that represents a plurality of objects, said method comprising the steps of:
- identifying a set of features that characterize each of the plurality of objects, said set of features including a plurality of subsets of features;
storing a plurality of data values in a data base, each of the plurality of data values representing an instance of the set of features;
detecting a set of clusters of information in the database, each cluster in the set of clusters being detected only by a subset of data values and a corresponding subset of features, each subset of features being specific to a respective cluster.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method are provided to analyze information stored in a computer data base by detecting clusters of related or correlated data values. Data values stored in the data base represent a set of objects. A data value is stored in the data base as an instance of a set of features that characterize the objects. The features are the dimensions of the feature space of the data base. Each cluster includes not only a subset of related data values stored in the data base but also a subset of features. The data values in a cluster are data values that are a short distance apart, in the sense of a metric, when projected onto a subspace that corresponds to the subset of features of the cluster. A set of k clusters may be detected such that the average number of features of the subsets of features of the clusters is l.
73 Citations
25 Claims
-
1. A method of analyzing information that represents a plurality of objects, said method comprising the steps of:
-
identifying a set of features that characterize each of the plurality of objects, said set of features including a plurality of subsets of features;
storing a plurality of data values in a data base, each of the plurality of data values representing an instance of the set of features;
detecting a set of clusters of information in the database, each cluster in the set of clusters being detected only by a subset of data values and a corresponding subset of features, each subset of features being specific to a respective cluster. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 25)
(a) generating a set of medoids from ones of the plurality of data values;
(b) associating with each medoid of the set of medoids ones of the set of features, wherein each of said medoid is a data value chosen from the database; and
(c) forming the set of clusters detected in the database by assigning ones of the plurality of data values to each medoid of the set of medoids.
-
-
11. The method according to claim 10, wherein detecting the set of clusters of information in the database further comprises the steps of:
-
generating a set of new medoids by removing one medoid of the set of medoids from the set of medoids and adding one of the plurality of data values to the set of medoids; and
forming a set of new clusters of information detected in the database by repeating said steps (a)-(c) replacing therein the set of medoids with the set of new medoids.
-
-
12. The method according to claim 10, wherein the set of medoids is generated by choosing ones of the plurality of data values at random.
-
13. The method according to claim 10, wherein each medoid of the set of medoids is associated with ones of the set of features by calculating a plurality of distances from each medoid of the set of medoids to a set of data values in a neighborhood of each medoid of the set of medoids with respect to ones of the set of features.
-
14. The method according to claim 13, wherein each medoid of the set of medoids is associated with ones of the set of features corresponding to a distance of the plurality of distances from each medoid of the set of medoide to a set of data values in a neighborhood of each medoid of the set of medoids with respect to ones of the set of features.
-
15. The method according to claim 10, wherein detecting the set of clusters of information in the database further comprises the steps of:
-
generating a set of new medoids by removing one medoid of the set of medoids from the set of medoids and adding one of the plurality of data values to the set of medoids; and
repeating said steps (a) and (b) replacing therein the set of medoids with the set of new medoids, and repeating said step (c) to form a set of new clusters of information detected in the database only if an objective function evaluated with respect to the set of new medoids is less than the objective function evaluated with respect to the set of medoids.
-
-
16. The method according to claim 15, wherein the objective function is evaluated with respect to the set of medoids based on a plurality of distances, evaluated with respect to ones of the set of features associated with each medoid of the set of medoids, between each medoid of the set of medoids and ones of the plurality of data values and a plurality of dimensionality values, each of the plurality of dimensionality values corresponding to ones of the set of features associated with each medoid of the set of medoids, and the objective function is evaluated with respect to the set of new medoids based on a plurality of new distances, evaluated with respect to ones of the set of features associated with each new medoid of the set of new medoids, between each new medoid of the set of new medoids and ons of the plurality of data values and a plurality of new dimensionality values, each of the plurality of new dimensionality values corresponding to ones of the set of features associated with each new medoid of the set of new medoids.
-
17. The method according to claim 10, wherein the set of clusters detected in the database is formed by assigning ones of the plurality of data values to each medoid of the set of medoids based on a distance measure, with respect to ones of the set of features, between ones of the plurality of data values and each medoid of the set of medoids.
-
18. The method according to claim 1, wherein detecting the set of clusters of information in the database comprises the steps of:
-
(a) generating a set of medoids from ones of the plurality of data values;
(b) associating with each medoid of the set of medoids ones of the set of features;
(c) forming the set of clusters detected in the database by assigning ones of the plurality of data values to each medoid of the set of medoids;
(d) generating a set of new medoids by removing one medoid of the set of medoids from the set of medoids and adding one of the plurality of data values to the set of medoids;
(e) repeating said steps (a) and (b) replacing therein the set of medoids with the set of new medoids, and repeating said step (c) to form a set of new clusters of information detecting in the database if an objective function evaluated with respect to the set of new medoids is less than the objective function evaluated with respect to the set of new medoids; and
(f) repeating said steps (d) and (e) a plurality of times until the objective function evaluated with respect to the set of new medoids is greater than or equal to the objective function evaluated with respect to the set of medoids for a predetermined number of times of the plurality of times.
-
-
19. The method according to claim 1, wherein detecting the set of clusters of information in the database further comprises the steps of:
-
generating a set of medoids from ones of the plurality of data values; and
deleting from the database ones of the plurality of data values having a segmental distance from each of the set of medoids greater than a predetermined maximum segmental value.
-
-
25. The method according to claim 1, wherein said detecting step, for each cluster in the set of clusters, comprises the steps of:
-
(a) generating a set of medoids from ones of the plurality of data values, each medoid of the set of medoids being a data value chosen from the database;
(b) associating with each medoid of the set of medoids ones of a given feature subset of the set of features; and
(c) forming the set of clusters detected in the database by assigning ones of the plurality of data values to each medoid of the set of medoids, wherein each medoid of the set of medoids is an anchor data value.
-
-
20. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for analyzing information that represents a plurality of objects, said method comprising the steps of;
-
identifying a set of features that characterize each of the plurality of objects, said set of features including a plurality of subsets of features;
storing a plurality of data values in a data base, each of the plurality of data values representing an instance of the set of features;
detecting a set of clusters of information in the database, each cluster in the set of clusters being detected only by a subset of data values and a corresponding subset of features, each subset of features being specific to a respective cluster. - View Dependent Claims (21, 22, 23, 24)
(a) generating a set of medoids from ones of the plurality of data values, wherein each of said medoid is a data value chosen from the database;
(b) associating with each medoid of the set of medoids ones of the set of features; and
(c) forming the set of clusters detected in the database by assigning ones of the plurality of data values to each medoid of the set of medoids.
-
-
24. The program storage device according to claim 20, wherein detecting the set of clusters of information in the database comprises the steps of:
-
(a) generating a set of medoids from ones of the plurality of data values;
(b) associating with each medoid of the set of medoids ones of the set of features;
(c) forming the set of clusters detected in the database by assigning ones of the plurality of data values to each medoid of the set of medoids;
(d) generating a set of new medoids by removing one medoid of the set of medoids from the set of medoids and adding one of the plurality of data values to the set of medoids;
(e) repeating said steps (a) and (b) replacing therein the set of medoids with the set of new medoids, and repeating said step (c) to form a set of new clusters of information detecting in the database if an objective function evaluated with respect to the set of new medoids is less than the objective function evaluated with respect to the set of new medoids is less than the objective function evaluated with respect to the set of medoids; and
(f) repeating said steps (d) and (e) a plurality of times until the objective function evaluated with respect to the set of new medoids is greater than or equal to the objective function evaluated with respect to the set of medoids for a predetermined number of times of the plurality of times.
-
Specification