Generating partitioned hierarchical groups based on data sets for business intelligence data models
First Claim
1. A method for generating a hierarchical group based on a set of data, the method comprising:
- classifying two or more data items from a set of data with respect to a library of ontological concepts based at least in part on properties of the two or more data items, including detecting correlations between the properties of the two or more data items and one or more ontological concepts from the library of ontological concepts, wherein the properties of the two or more data items include data types defined for the two or more data items and ranges of data values in data fields of the two or more data items;
classifying the two or more data items with respect to lexical correlations between the two or more data items, including determining correlations between one or more elements of headers of the two or more data items;
analyzing the two or more data items based on one or more factors to determine whether the one or more factors contribute to defining a hierarchical relationship, wherein the analysis utilizes the one or more factors that comprise a set of heuristic rules and relative cardinality, wherein the set of heuristic rules discounts or disqualifies quantifiers or metrics associated with the two or more data items, and wherein the relative cardinality minimizes quantifiers or metrics through merging;
generating a hierarchical group in which the two or more data items are partitioned into one or more hierarchical partitions based at least in part on the classifying with respect to the library of ontological concepts, the classifying with respect to the lexical correlations, and the analysis of the two or more data items based on the one or more factors, wherein each of the one or more hierarchical partitions comprises the two or more data items; and
verifying a sampling of data in the one or more hierarchical partitions, including measuring correlations between data in the two or more data items in a particular hierarchical partition from the one or more hierarchical partitions to determine whether the particular hierarchical partition has a first data item at a leaf level of the particular hierarchical partition in a one-to-many relationship with a second data item at a base level of the particular hierarchical partition.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques are described for generating a hierarchical group based on a set of data. In one example, a method includes classifying two or more data items from a set of data with respect to a library of ontological concepts. The method further includes classifying the two or more data items with respect to lexical correlations between the two or more data items. The method further includes generating a hierarchical group in which the two or more data items are partitioned into one or more hierarchical partitions based at least in part on the classifying with respect to the library of ontological concepts and the classifying with respect to the lexical correlations, wherein each of the one or more hierarchical partitions comprises the two or more data items.
168 Citations
20 Claims
-
1. A method for generating a hierarchical group based on a set of data, the method comprising:
-
classifying two or more data items from a set of data with respect to a library of ontological concepts based at least in part on properties of the two or more data items, including detecting correlations between the properties of the two or more data items and one or more ontological concepts from the library of ontological concepts, wherein the properties of the two or more data items include data types defined for the two or more data items and ranges of data values in data fields of the two or more data items; classifying the two or more data items with respect to lexical correlations between the two or more data items, including determining correlations between one or more elements of headers of the two or more data items; analyzing the two or more data items based on one or more factors to determine whether the one or more factors contribute to defining a hierarchical relationship, wherein the analysis utilizes the one or more factors that comprise a set of heuristic rules and relative cardinality, wherein the set of heuristic rules discounts or disqualifies quantifiers or metrics associated with the two or more data items, and wherein the relative cardinality minimizes quantifiers or metrics through merging; generating a hierarchical group in which the two or more data items are partitioned into one or more hierarchical partitions based at least in part on the classifying with respect to the library of ontological concepts, the classifying with respect to the lexical correlations, and the analysis of the two or more data items based on the one or more factors, wherein each of the one or more hierarchical partitions comprises the two or more data items; and verifying a sampling of data in the one or more hierarchical partitions, including measuring correlations between data in the two or more data items in a particular hierarchical partition from the one or more hierarchical partitions to determine whether the particular hierarchical partition has a first data item at a leaf level of the particular hierarchical partition in a one-to-many relationship with a second data item at a base level of the particular hierarchical partition. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer program product for generating a hierarchical group based on a set of data, the computer program product comprising a non-transitory computer-readable storage medium having program code embodied therewith, the program code executable by a computing device to:
-
classify two or more data items from a set of data with respect to a library of ontological concepts based at least in part on properties of the two or more data items, including detecting correlations between the properties of the two or more data items and one or more ontological concepts from the library of ontological concepts, wherein the properties of the two or more data items include data types defined for the two or more data items and ranges of data values in data fields of the two or more data items; classify the two or more data items with respect to lexical correlations between the two or more data items, including determining correlations between one or more elements of headers of the two or more data items; analyze the two or more data items based on one or more factors to determine whether the one or more factors contribute to defining a hierarchical relationship, wherein the analysis utilizes the one or more factors that comprise a set of heuristic rules and relative cardinality, wherein the set of heuristic rules discounts or disqualifies quantifiers or metrics associated with the two or more data items, and wherein the relative cardinality minimizes quantifiers or metrics through merging; generate a hierarchical group in which the two or more data items are partitioned into one or more hierarchical partitions based at least in part on the classifying with respect to the library of ontological concepts, the classifying with respect to the lexical correlations, and the analysis of the two or more data items based on the one or more factors, wherein each of the one or more hierarchical partitions comprises the two or more data items; and verify a sampling of data in the one or more hierarchical partitions, including measuring correlations between data in the two or more data items in a particular hierarchical partition from the one or more hierarchical partitions to determine whether the particular hierarchical partition has a first data item at a leaf level of the particular hierarchical partition in a one-to-many relationship with a second data item at a base level of the particular hierarchical partition. - View Dependent Claims (18)
-
-
19. A computer system for generating a hierarchical group based on a set of data, the computer system comprising:
-
one or more processors, one or more computer-readable memories, and one or more computer-readable, tangible storage devices; program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to classify two or more data items from a set of data with respect to a library of ontological concepts based at least in part on properties of the two or more data items, including determining correlations between the properties of the two or more data items and one or more ontological concepts from the library of ontological concepts, wherein the properties of the two or more data items include data types defined for the two or more data items and ranges of data values in data fields of the two or more data items; program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to classify the two or more data items with respect to lexical correlations between the two or more data items; program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to analyze the two or more data items based on one or more factors to determine whether the one or more factors contribute to defining a hierarchical relationship, wherein the analysis utilizes the one or more factors that comprise a set of heuristic rules and relative cardinality, wherein the set of heuristic rules discounts or disqualifies quantifiers or metrics associated with the two or more data items, and wherein the relative cardinality minimizes quantifiers or metrics through merging; program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to generate a hierarchical group in which the two or more data items are partitioned into one or more hierarchical partitions based at least in part on the classifying with respect to the library of ontological concepts, the classifying with respect to the lexical correlations, and the analysis of the two or more data items based on the one or more factors, wherein each of the one or more hierarchical partitions comprises the two or more data items; and program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to verify a sampling of data in the one or more hierarchical partitions, including measuring correlations between data in the two or more data items in a particular hierarchical partition from the one or more hierarchical partitions to determine whether the particular hierarchical partition has a first data item at a leaf level of the particular hierarchical partition in a one-to-many relationship with a second data item at a base level of the particular hierarchical partition. - View Dependent Claims (20)
-
Specification