Efficient SQL based multi-attribute clustering
First Claim
1. A method, comprising:
- receiving a global outcome value, wherein the global outcome value is a global threshold for identifying a plurality of subsets of data attributes;
calculating, in parallel, a plurality of subset outcome values for the plurality of subsets of data attributes, respectively, of a plurality of data attributes, wherein each subset outcome value of a subset of data attributes represents an average value of a data attribute outcome for members that are associated with each of the data attributes of the subset of data attributes;
comparing, for each subset of data attributes, a number of members associated with the subset of data attributes with a size threshold;
removing each subset of data attributes for which the number of members associated with the subset is below the size threshold;
comparing, for each subset of data attributes, the subset outcome value for the subset with the global outcome value as the global threshold for identifying a plurality of subsets of data attributes; and
generating a report that identifies each subset of data attributes for which the corresponding subset outcome value is greater by a first threshold than or less by a second threshold than the global outcome value.
1 Assignment
0 Petitions
Accused Products
Abstract
Efficient SQL based multi-attribute clustering of data attributes may be used to identify the most relevant combination of data attributes to an outcome. A global outcome value may be calculated to represent an average of the outcome. A subset outcome value for each subset of data attributes of a plurality of attributes may be calculated to represent average of the outcome for the subset. For each subset of data attributes, a number of members associated with the subset may be compared to a threshold, and the subsets with less members than the threshold may be removed. The subset outcome value for each subset of data attributes may be compared to the global outcome value, and a report may be generated that identifies each subset for which the corresponding subset outcome value is greater than or less than the global outcome value.
28 Citations
21 Claims
-
1. A method, comprising:
-
receiving a global outcome value, wherein the global outcome value is a global threshold for identifying a plurality of subsets of data attributes; calculating, in parallel, a plurality of subset outcome values for the plurality of subsets of data attributes, respectively, of a plurality of data attributes, wherein each subset outcome value of a subset of data attributes represents an average value of a data attribute outcome for members that are associated with each of the data attributes of the subset of data attributes; comparing, for each subset of data attributes, a number of members associated with the subset of data attributes with a size threshold; removing each subset of data attributes for which the number of members associated with the subset is below the size threshold; comparing, for each subset of data attributes, the subset outcome value for the subset with the global outcome value as the global threshold for identifying a plurality of subsets of data attributes; and generating a report that identifies each subset of data attributes for which the corresponding subset outcome value is greater by a first threshold than or less by a second threshold than the global outcome value. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system, comprising:
-
a data storage device configured to store a plurality of data records for a plurality of members, wherein each data record comprises a plurality of data attributes; a processor in data communication with the data storage device and configured to; receive a global outcome value, wherein the global outcome value is a global threshold for identifying a plurality of subsets of data attributes; calculate, in parallel, a subset outcome value for the plurality of subsets of data attributes of a plurality of data attributes, wherein each subset outcome value of a subset of data attributes represents an average value of a data attribute outcome for members that are associated with each of the data attributes of the subset of data attributes; compare, for each subset of data attributes, a number of members associated with the subset of data attributes with a size threshold; remove each subset of data attributes for which the number of members associated with the subset is below the size threshold; compare, for each subset of data attributes, the subset outcome value for the subset with the global outcome value as the global threshold for identifying a plurality of subsets of data attributes; and generate a report that identifies each subset of data attributes for which the corresponding subset outcome value is greater by a first threshold than or less by a second threshold than the global outcome value. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer program product, comprising a non-transitory computer readable medium having computer executable instructions to perform operations comprising:
-
receiving a global outcome value, wherein the global outcome value is a global threshold for identifying a plurality of subsets of data attributes; calculating, in parallel, a subset outcome value for the plurality of subsets of data attributes of a plurality of data attributes, wherein each subset outcome value of a subset of data attributes represents an average value of a data attribute outcome for members that are associated with each of the data attributes of the subset of data attributes; comparing, for each subset of data attributes, a number of members associated with the subset of data attributes with a size threshold; removing each subset of data attributes for which the number of members associated with the subset is below the size threshold; comparing, for each subset of data attributes, the subset outcome value for the subset with the global outcome value as the global threshold for identifying a plurality of subsets of data attributes; and generating a report that identifies each subset of data attributes for which the corresponding subset outcome value is greater by a first threshold than or less by a second threshold than the global outcome value. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification