System and method for similarity searching in high-dimensional data space
First Claim
1. A method of analyzing information in the form of a plurality of data values that represent a plurality of objects, said method comprising the steps of:
- (a) identifying a set of features that characterize each object of the plurality of objects;
(b) storing the plurality of data values in a database, each of the plurality of data values corresponding to at least one of the plurality of objects based on the set of features;
(c) partitioning ones of the plurality of data values stored in the database into a plurality of clusters;
(d) assigning each cluster of the plurality of clusters one respective node of a plurality of nodes arranged in a tree hierarchy;
(e) traversing ones of the plurality of nodes of the tree hierarchy.
1 Assignment
0 Petitions
Accused Products
Abstract
Information is analyzed in the form of a plurality of data values that represent a plurality of objects. A set of features that characterize each object of the plurality of objects is identified. The plurality of data values are stored in a database. Each data value corresponds to at least one of the plurality of objects based on the set of features. Ones of the plurality of data values stored in the database are partitioned into a plurality of clusters. Each cluster of the plurality of clusters is assigned to one respective node of a plurality of nodes arranged in a tree hierarchy. Ones of the plurality of nodes of the tree hierarchy are traversed. If desired, information may be analyzed for finding peer groups in e-commerce applications.
-
Citations
30 Claims
-
1. A method of analyzing information in the form of a plurality of data values that represent a plurality of objects, said method comprising the steps of:
-
(a) identifying a set of features that characterize each object of the plurality of objects;
(b) storing the plurality of data values in a database, each of the plurality of data values corresponding to at least one of the plurality of objects based on the set of features;
(c) partitioning ones of the plurality of data values stored in the database into a plurality of clusters;
(d) assigning each cluster of the plurality of clusters one respective node of a plurality of nodes arranged in a tree hierarchy;
(e) traversing ones of the plurality of nodes of the tree hierarchy. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
(f) calculating a distance between each cluster of the plurality of clusters assigned to one of the plurality of nodes traversed and at least one of a plurality of target values defined by a user; and
(g) detecting at least one of the plurality of data values that is comparable to at least one of the plurality of target values based on the distance.
-
-
6. The method according to claim 5, wherein a predetermined number of ones of the plurality of data values that are comparable to one of the plurality of target values are detected.
-
7. The method according to claim 5, wherein a first predetermined number of ones of the plurality of data values that are comparable to a second predetermined number of ones of the plurality of target values are detected.
-
8. The method according to claim 5, wherein one of the plurality of data values that is comparable to a predetermined number of the plurality of target values is detected.
-
9. The method according to claim 5, wherein the step of detecting at least one of the plurality of data values that is comparable to at least one of the plurality of target values, further comprises the step of computing the distance between each one of the plurality of data values of one cluster of the plurality of clusters that is assigned to a node of the plurality of nodes of a lowest level of the tree hierarchy.
-
10. The method according to claim 5, further comprising the step of pruning at least one of the plurality of nodes of the tree hierarchy if the distance is greater than a predetermined upper bound.
-
11. The method according to claim 10, wherein k data values of the plurality of data values that are comparable to at least one of the plurality of target values are determined based on the distance, and to produce a pruned tree hierarchy.
-
12. The method according to claim 11, wherein steps (d)-(g) are repeated, replacing therein the tree hierarchy with the pruned tree hierarchy, to determine p′
- <
p data values of the plurality of data values that are comparable to at least one of the plurality of target values based on the distance.
- <
-
13. The method according to claim 11, wherein steps (d)-(g) are repeated, replacing therein the predetermined upper bound with a new upper bound calculated based on the p data values, to determine p′
- >
p data values of the plurality of data values that are comparable to at least one of the plurality of target values based on the distance.
- >
-
14. The method according to claim 5, further comprising the step of pruning at least one of the plurality of nodes of the tree hierarchy if the distance is less than a predetermined lower bound, and to produce a pruned tree hierarchy.
-
15. The method according to claim 14, wherein steps (d)-(g) are repeated, replacing therein the tree hierarchy with the pruned tree hierarchy, to determine p′
- <
p data values of the plurality of data values that are comparable to at least one of the plurality of target values based on the distance.
- <
-
16. The method according to claim 14, wherein steps (d)-(g) are repeated, replacing therein the predetermined lower bound with a new lower bound calculated based on the p data values, to determine p′
- >
p data values of the plurality of data values that are comparable to at least one of the plurality of target values based on the distance.
- >
-
17. A method of analyzing information in the form of a plurality of data values that represent a plurality of objects, said method comprising the steps of:
-
(a) identifying a set of features that characterize each object of the plurality of objects;
(b) storing the plurality of data values in a database, each of the plurality of data values corresponding to at least one of the plurality of objects based on the set of features;
(c) partitioning ones of the plurality of data values stored in the database into a plurality of clusters;
(d) assigning each cluster of the plurality of clusters one respective node of a plurality of nodes arranged in a tree hierarchy;
(e) bounding each cluster of the plurality of clusters with a capsule;
(f) traversing ones of the plurality of nodes of the tree hierarchy;
(g) calculating a distance between the capsule bounding each cluster of the plurality of clusters assigned to one of the plurality of nodes traversed and at least one of a plurality of user defined target values;
(h) detecting at least one of the plurality of data values that is comparable to at least one of the plurality of user defined target values based on the distance. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
(1) transforming ones of the plurality of target values to a new coordinate system; and
(2) calculating the distance between a surface of the capsule bounding each cluster of the plurality of clusters assigned to one of the plurality of nodes traversed and at least one of the plurality of target values in the new coordinate system.
-
-
20. The method according to claim 17, wherein an interactively defined number L of the plurality of data values that are comparable to q user defined target values of the plurality of user defined target values are detected.
-
21. The method according to claim 20, wherein the plurality of data values correspond to a plurality of transaction records, the plurality of transaction records are associated with a plurality of items for sale, and the L data values detected correspond to L transaction records of the plurality of transaction records, the L transaction records comprising a peer group of the q user defined target values.
-
22. The method according to claim 21, wherein ones of the plurality of items for sale are provided to a customer based on an intersection between a promotion list and ones of the plurality of items for sale associated with ones of the L transaction records of the peer group.
-
23. The method according to claim 22, further comprising the steps of:
-
(i) comparing a total number of the ones of the plurality of items for sale provided to the customer with a given value r;
(j) repeating steps (f)-(h) to detect L′
<
L transaction records comprising the peer group of the q user defined target values, if the total number of the ones of the plurality of items for sale provided to the customer is greater than r; and
(k) repeating steps (f)-(h) to detect L′
>
L transaction records comprising the peer group of the q user defined target values, if the total number of the ones of the plurality of items for sale provided to the customer is smaller than r.
-
-
24. The method according to claim 21, wherein the steps of the method are used for e-commerce.
-
25. A method of analyzing information in the form of a plurality of data values that represent a plurality of objects, said method comprising the steps of:
-
(a) identifying a set of features that characterize each object of the plurality of objects;
(b) storing the plurality of data values in a database, each of the plurality of data values corresponding to at least one of the plurality of objects based on the set of features;
(c) partitioning ones of the plurality of data values stored in the database into a plurality of clusters;
(d) assigning each cluster of the plurality of clusters one respective node of a plurality of nodes arranged in a tree hierarchy;
(e) bounding each cluster of the plurality of clusters with a capsule;
(f) traversing ones of the plurality of nodes of the tree hierarchy;
(g) transforming ones of a plurality of target values defined by a user to a new coordinate system;
(h) calculating a distance between a surface of the capsule bounding each cluster of the plurality of clusters assigned to one of the plurality of nodes traversed and at least one of the plurality of target values in the new coordinate system;
(i) pruning at least one of the plurality of nodes of the tree hierarchy if the distance is greater than a predetermined upper bound; and
(j) detecting at least one of the plurality of data values that is comparable to at least one of the plurality of target values based on the distance.
-
-
26. A method of analyzing information in the form of a plurality of data values that represent a plurality of objects, said method comprising the steps of:
-
(a) identifying a set of features that characterize each object of the plurality of objects;
(b) storing the plurality of data values in a database, each of the plurality of data values corresponding to at least one of the plurality of objects based on the set of features;
(c) partitioning ones of the plurality of data values stored in the database into a plurality of clusters;
(d) assigning each cluster of the plurality of clusters one node of a plurality of nodes arranged in a tree hierarchy;
(e) bounding each cluster of the plurality of clusters with a capsule;
(f) traversing ones of the plurality of nodes of the tree hierarchy;
(g) transforming ones of a plurality of target values defined by a user to a new coordinate system;
(h) calculating a distance between a surface of the capsule bounding each cluster of the plurality of clusters assigned to one of the plurality of nodes traversed and at least one of the plurality of target values in the new coordinate system;
(i) pruning at least one of the plurality of nodes of the tree hierarchy if the distance is less than a predetermined lower bound; and
(j) detecting at least one of the plurality of data values that is comparable to at least one of the plurality of target values based on the distance.
-
-
27. An article of manufacture comprising a computer usable medium having computer readable program code means embodied therein for analyzing information in the form of a plurality of data values that represent a plurality of objects, the computer readable program code means in said article of manufacture comprising computer readable program code means for causing a computer to effect:
-
(a) identifying a set of features that characterize each object of the plurality of objects;
(b) storing the plurality of data values in a database, each of the plurality of data values corresponding to at least one of the plurality of objects based on the set of features;
(c) partitioning ones of the plurality of data values stored in the database into a plurality of clusters;
(d) assigning each cluster of the plurality of clusters one respective node of a plurality of nodes arranged in a tree hierarchy;
(e) bounding each cluster of the plurality of clusters with a capsule;
(f) traversing ones of the plurality of nodes of the tree hierarchy;
(g) calculating a distance between the capsule bounding each cluster of the plurality of clusters assigned to one of the plurality of nodes traversed and at least one of a plurality of user defined target values;
(h) detecting at least one of the plurality of data values that is comparable to at least one of the plurality of user defined target values based on the distance.
-
-
28. An article of manufacture comprising a computer usable medium having computer readable program code means embodied therein for analyzing information in the form of a plurality of data values that represent a plurality of objects, the computer readable program code means in said article of manufacture comprising computer readable program code means for causing a computer to effect:
-
(a) identifying a set of features that characterize each object of the plurality of objects;
(b) storing the plurality of data values in a database, each of the plurality of data values corresponding to at least one of the plurality of objects based on the set of features;
(c) partitioning ones of the plurality of data values stored in the database into a plurality of clusters;
(d) assigning each cluster of the plurality of clusters one respective node of a plurality of nodes arranged in a tree hierarchy;
(e) traversing ones of the plurality of nodes of the tree hierarchy.
-
-
29. A computer program product comprising a computer usable medium having computer readable program code means embodied therein for causing an analysis of information in the form of a plurality of data values that represent a plurality of objects, the computer readable program code means in said computer program product comprising computer readable program code means for causing a computer to effect:
-
(a) identifying a set of features that characterize each object of the plurality of objects;
(b) storing the plurality of data values in a database, each of the plurality of data values corresponding to at least one of the plurality of objects based on the set of features;
(c) partitioning ones of the plurality of data values stored in the database into a plurality of clusters;
(d) assigning each cluster of the plurality of clusters one respective node of a plurality of nodes arranged in a tree hierarchy;
(e) traversing ones of the plurality of nodes of the tree hierarchy.
-
-
30. A computer program product comprising a computer usable medium having computer readable program code means embodied therein for causing an analysis of information in the form of a plurality of data values that represent a plurality of objects, the computer readable program code means in said computer program product comprising computer readable program code means for causing a computer to effect:
-
(a) identifying a set of features that characterize each object of the plurality of objects;
(b) storing the plurality of data values in a database, each of the plurality of data values corresponding to at least one of the plurality of objects based on the set of features;
(c) partitioning ones of the plurality of data values stored in the database into a plurality of clusters;
(d) assigning each cluster of the plurality of clusters one respective node of a plurality of nodes arranged in a tree hierarchy;
(e) bounding each cluster of the plurality of clusters with a capsule;
(f) traversing ones of the plurality of nodes of the tree hierarchy;
(g) calculating a distance between the capsule bounding each cluster of the plurality of clusters assigned to one of the plurality of nodes traversed and at least one of a plurality of user defined target values;
(h) detecting at least one of the plurality of data values that is comparable to at least one of the plurality of user defined target values based on the distance.
-
Specification