Automated identification and classification of critical data elements
First Claim
Patent Images
1. A method comprising:
- obtaining data elements from data assets associated with an enterprise;
identifying one or more of the data elements as one or more critical data elements based on a level of criticality computed for each of the one or more data elements;
wherein the level of criticality is based on a cardinality computed for the one or more data elements and wherein the cardinality of a data element is computed as the total number of nodes in a data lineage map that have consumed the data element or a derivation of the data element;
wherein the level of criticality is based on a correlation criterion computed for the one or more data elements;
wherein the correlation criterion is based on indirect correlation identified between a given data element and portions of the enterprise for which the data element is potentially critical the indirect correlation being calculated through application of a correlation algorithm against the given data element and another data element that is not connected to the given data element in the data lineage map; and
wherein the obtaining and identifying are implemented by one or more processing devices each comprising a processor coupled to a memory.
5 Assignments
0 Petitions
Accused Products
Abstract
A data governance method comprises the following steps. Data elements from data assets associated with an enterprise are obtained. One or more of the data elements are identified as one or more critical data elements based on a level of criticality computed for each of the one or more data elements. In illustrative embodiments, the level of criticality is based on one or more of: a cardinality computed for the one or more data elements; a business relevance criterion computed for the one or more data elements; and an indirect cross-data lake correlation criterion computed for the one or more data elements.
17 Citations
16 Claims
-
1. A method comprising:
-
obtaining data elements from data assets associated with an enterprise; identifying one or more of the data elements as one or more critical data elements based on a level of criticality computed for each of the one or more data elements; wherein the level of criticality is based on a cardinality computed for the one or more data elements and wherein the cardinality of a data element is computed as the total number of nodes in a data lineage map that have consumed the data element or a derivation of the data element;
wherein the level of criticality is based on a correlation criterion computed for the one or more data elements;
wherein the correlation criterion is based on indirect correlation identified between a given data element and portions of the enterprise for which the data element is potentially critical the indirect correlation being calculated through application of a correlation algorithm against the given data element and another data element that is not connected to the given data element in the data lineage map; andwherein the obtaining and identifying are implemented by one or more processing devices each comprising a processor coupled to a memory. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes said at least one processing device to:
-
obtain data elements from data assets associated with an enterprise; and identify one or more of the data elements as one or more critical data elements based on a level of criticality computed for each of the one or more data elements, the level of criticality being based on a cardinality computed for the one or more data elements, the cardinality of a data element being computed as the total number of nodes in a data lineage map that have consumed the data element or a derivation of the data element;
wherein the level of criticality is based on a correlation criterion of an indirect correlation identified between a given data element and portions of the enterprise for which the data element is potentially critical, the indirect correlation being calculated through application of a correlation algorithm against the given data element and another data element that is not connected to the given data element in the data lineage map. - View Dependent Claims (11)
-
-
12. An apparatus comprising:
-
at least one processing platform accessible to a plurality of user devices over at least one network; wherein the processing platform implements a critical element manager for data assets of an enterprise, and wherein the critical data element manager is configured to; obtain data elements from the data assets associated with the enterprise; and identify one or more of the data elements as one or more critical data elements based on a level of criticality computed for each of the one or more data elements, the level of criticality being based on a cardinality computed for the one or more data elements, the cardinality of the data element being computed as the total number of nodes in a data lineage map that have consumed the data element or a derivation of the data element; wherein the processing platform is implemented by one or more processing devices each comprising a processor coupled to a memory wherein the level of criticality is based on a correlation criterion of an indirect correlation identified between a given data element and portions of the enterprise for which the data element is potentially critical, the indirect correlation being calculated through application of a correlation algorithm against the given data element and another data element that is not connected to the given data element in the data lineage map. - View Dependent Claims (13, 14, 15, 16)
-
Specification