System and method for detecting clusters of information with application to e-commerce
First Claim
1. A method of analyzing information in the form of a plurality of data values which represent a plurality of objects, the plurality of data values distributed in a data space, said method comprising the steps of:
- (a) identifying a set of features which characterize each of the plurality of objects;
(b) storing the plurality of data values in a database, each of the plurality of data values corresponding to at least one of the plurality of objects based on the set of features;
(c) partitioning ones of the plurality of data values stored in the database into a plurality of clusters;
(d) calculating a respective orientation associated with a position in data space of data values which are contained in each respective cluster of the plurality of clusters based on the set of features;
(h) merging one cluster with an other cluster of the plurality of clusters based on a radius of a union of data values which are contained in the one cluster and the data values which are contained in the other cluster; and
(i) generating a new plurality of clusters based on the result of step (h).
1 Assignment
0 Petitions
Accused Products
Abstract
A method of analyzing information in the form of a plurality of data values. The plurality of data values represent a plurality of objects. The plurality of data values are distributed in a data space. A set of features which characterize each of the plurality of objects is identified. The plurality of data values are stored in a database. Each of the plurality of data values corresponds to at least one of the plurality of objects based on the set of features. Ones of the plurality of data values stored in the database are partitioned into a plurality of clusters. A respective orientation associated with a position in data space of data values which are contained in each respective cluster of the plurality of clusters is calculated based on the set of features. If desired, information may be analyzed for finding peer groups in e-commerce applications.
81 Citations
27 Claims
-
1. A method of analyzing information in the form of a plurality of data values which represent a plurality of objects, the plurality of data values distributed in a data space, said method comprising the steps of:
-
(a) identifying a set of features which characterize each of the plurality of objects;
(b) storing the plurality of data values in a database, each of the plurality of data values corresponding to at least one of the plurality of objects based on the set of features;
(c) partitioning ones of the plurality of data values stored in the database into a plurality of clusters;
(d) calculating a respective orientation associated with a position in data space of data values which are contained in each respective cluster of the plurality of clusters based on the set of features;
(h) merging one cluster with an other cluster of the plurality of clusters based on a radius of a union of data values which are contained in the one cluster and the data values which are contained in the other cluster; and
(i) generating a new plurality of clusters based on the result of step (h). - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
(e) choosing a plurality of seeds from the plurality of data values;
(f) calculating a distance between any one of the plurality of data values and each of the plurality of seeds; and
(g) assigning the ones of the plurality of data values to the plurality of seeds based on the distance.
-
-
5. A method of analyzing information according to claim 1, further comprising the steps of:
-
(j) calculating a new respective orientation associated with a position in data space of data values which are contained in each respective new cluster of the plurality of new clusters based on the set of features; and
(k) generating a plurality of seeds, each respective seed of the plurality of seeds corresponding to each respective new cluster of the plurality of clusters.
-
-
6. A method of analyzing information according to claim 5, further comprising the steps of:
-
(l) calculating a distance between any one of the plurality of data values and each of the plurality of seeds based on the new respective orientation; and
(m) repeating step (c) by assigning the ones of the plurality of data values to the plurality of seeds based on the distance.
-
-
7. A method of analyzing information according to claim 5, wherein the plurality of seeds are generated by calculating a respective centroid associated with each respective new cluster.
-
8. A method of analyzing information according to claim 1, wherein the respective orientation is calculated by choosing ones of a plurality of eigenvalues of a covariance matrix of data values which are contained in each respective cluster.
-
9. The method of analyzing information according to claim 1, wherein the plurality of data values represent a plurality of transactions associated with a plurality of customers and a plurality of items for sale.
-
10. The method of analyzing information according to claim 1, further comprising the steps of:
-
providing a target value;
calculating a respective distance between the target value and each respective cluster based on each respective orientation; and
selecting r clusters of the plurality of clusters based on the respective distance.
-
-
11. The method of analyzing information according to claim 10, wherein the r clusters comprise a peer group of the target value which is recommended to a user.
-
12. The method of analyzing information according to claim 10, further comprising the step of
determining an intersection of a set of data values contained by the r clusters selected with a promotions list. -
13. The method of analyzing information according to claim 12, wherein the intersection of data values contained by the r clusters with the promotions list comprise a peer group of the target value which is recommended to a user.
-
14. An article of manufacture comprising a computer useable medium having computer readable program code means embodied therein for analyzing information in the form of a plurality of data values which represent a plurality of objects, the plurality of data values distributed in a data space, the computer readable program code means in said article of manufacture comprising computer readable program code means for causing a computer to effect:
-
(a) identifying a set of features which characterize each of the plurality of objects;
(b) storing the plurality of data values in a database, each of the plurality of data values corresponding to at least one of the plurality of objects based on the set of features;
(c) partitioning ones of the plurality of data values stored in the database into a plurality of clusters;
(d) calculating a respective orientation associated with a position in data space of data values which are contained in each respective cluster of the plurality of clusters based on the set of features;
(h) merging one cluster with an other cluster of the plurality of clusters based on a radius of a union of data values which are contained in the one cluster and the data values which are contained in the other cluster; and
(i) generating a new plurality of clusters based on the result of step (h). - View Dependent Claims (15, 16, 17, 18)
providing a target value;
calculating a respective distance between the target value and each respective cluster based on each respective orientation; and
selecting r clusters of the plurality of clusters based on the respective distance.
-
-
17. An article of manufacture as recited in claim 14, wherein the respective orientation associated with each respective cluster includes an orthonormal basis of a subspace of the data space.
-
18. An article of manufacture as recited in claim 16, wherein the r clusters comprise a peer group of the target value which is recommended to a user.
-
19. A program storage device readable be machine, tangibly embodying a program of instructions executable by the machine to perform method steps for analyzing information in the form of a plurality of data values which represent a plurality of objects, the plurality of data values distributed in a data space, said method comprising the steps of:
-
(a) identifying a set of features which characterize each of the plurality of objects;
(b) storing the plurality of data values in a database, each of the plurality of data values corresponding to at least one of the plurality of objects based on the set of features;
(c) partitioning ones of the plurality of data values stored in the database into a plurality of clusters;
(d) calculating a respective orientation associated with a position in data space of data values which are contained in each respective cluster of the plurality of clusters based on the set of features;
(h) merging one cluster with an other cluster of the plurality of clusters based on a radius of a union of data values which are contained in the one cluster and the data values which are contained in the other cluster; and
(i) generating a new plurality of clusters based on the result of step (h). - View Dependent Claims (20, 21, 22, 23)
providing a target value;
calculating a respective distance between the target value and each respective cluster based on each respective orientation; and
selecting r clusters of the plurality of clusters based on the respective distance.
-
-
23. A program storage device as recited in claim 22, wherein the r clusters comprise a peer group of the target value which is recommended to a user.
-
24. A computer program product comprising a computer useable medium having computer readable program code means embodied therein for causing analysis of information in the form of a plurality of data values which represent a plurality of objects, the plurality of data values distributed in a data space, the computer readable program code means in said computer program product comprising computer readable program means for causing a computer to effect:
-
(a) identifying a set of features which characterize each of the plurality of objects;
(b) storing the plurality of data values in a database, each of the plurality of data values corresponding to at least one of the plurality of objects based on the set of features;
(c) partitioning ones of the plurality of data values stored in the database into a plurality of clusters;
(d) calculating a respective orientation associated with a position in data space of data values which are contained in each respective cluster of the plurality of clusters based on the set of features;
providing a target value;
calculating a respective distance between the target value and each respective cluster based on each respective orientation; and
selecting r clusters of the plurality of clusters based on the respective distance. - View Dependent Claims (25, 26, 27)
-
Specification