METHOD AND SYSTEM FOR ACCELERATED DATA QUALITY ENHANCEMENT
First Claim
1. A computer-implemented method for producing data quality rules for a data set, comprising:
- generating a set of candidate conditional functional dependencies based on a set of candidate seeds by using an ontology of said data set, said candidate seeds being comprised of a subset of attributes drawn from a set of the attributes of said data set that have a predetermined degree of separation in said ontology;
applying said candidate conditional functional dependencies individually to said data set to obtain a set of corresponding result values for said candidate conditional functional dependencies;
refining said candidate conditional functional dependencies individually and repeating said applying if said set of corresponding result values does not have a result signature that meets a predetermined expectation;
terminating said applying and refining of said candidate conditional functional dependencies individually when said candidate conditional functional dependencies individually reach a quiescent state; and
selecting a relevant set of said candidate conditional functional dependencies to be used as said data quality rules for said data set.
2 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of the present invention solve the technical problem of identifying, collecting, and managing rules that improve poor quality data on enterprise initiatives ranging from data governance to business intelligence. In a specific embodiment of the present invention, a method is provided for producing data quality rules for a data set. A set of candidate conditional functional dependencies are generated comprised of candidate seeds of attributes that are within a certain degree of relatedness in the ontology of the data set. The candidate conditional functional dependencies are then applied to the data refined until they reach a quiescent state where they have not been refined even though the data they have been applied to has been stable. The resulting refined candidate conditional functional dependencies are the data enhancement rules for the data set and other related data sets. In another specific embodiment of the present invention, a computer system for the development of data quality rules is provided having a rule repository, a data quality rules discovery engine, and a user interface.
44 Citations
20 Claims
-
1. A computer-implemented method for producing data quality rules for a data set, comprising:
-
generating a set of candidate conditional functional dependencies based on a set of candidate seeds by using an ontology of said data set, said candidate seeds being comprised of a subset of attributes drawn from a set of the attributes of said data set that have a predetermined degree of separation in said ontology; applying said candidate conditional functional dependencies individually to said data set to obtain a set of corresponding result values for said candidate conditional functional dependencies; refining said candidate conditional functional dependencies individually and repeating said applying if said set of corresponding result values does not have a result signature that meets a predetermined expectation; terminating said applying and refining of said candidate conditional functional dependencies individually when said candidate conditional functional dependencies individually reach a quiescent state; and selecting a relevant set of said candidate conditional functional dependencies to be used as said data quality rules for said data set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer-implemented method for enhancing data quality, comprising the steps of:
-
generating a set of candidate conditional functional dependencies based on a set of candidate seeds by using an ontology of a data set, each of said candidate seeds being comprised of a subset of attributes drawn from a set of all the attributes of said data set that have a predetermined degree of separation in said ontology; applying said candidate conditional functional dependencies individually to said data set to obtain a set of corresponding result values for each of said candidate conditional functional dependencies; refining said candidate conditional functional dependencies individually and repeating said applying if said set of corresponding result values does not have a result signature that meets a pre-determined expectation; terminating said applying and refining of said candidate conditional functional dependencies individually when said candidate conditional functional dependencies individually reach a quiescent state; selecting a relevant set of said candidate conditional functional dependencies to be used as said data quality rules for said data set; and enhancing the data quality of said data set by checking the data of said data set against said relevant set and screening said data if said data does not follow a rule contained in said relevant set. - View Dependent Claims (13, 14, 15)
-
-
16. A computer system for the development of data quality rules, comprising:
-
a rule repository for storing said data quality rules; a user interface capable of receiving a data set, an ontology, and a set of rule generation parameters, and capable of outputting a set of of data quality rules; a data quality rules discovery engine capable of receiving said data set, said ontology, and said set of rule generation parameters from said user interface, generating said set of data quality rules, and sending said set of data quality rules to said rule repository; wherein said data quality rules discovery engine formulates a set of candidate conditional functional dependencies based on a set of candidate seeds by using said ontology, said candidate seeds being comprised of a subset of attributes that have a predetermined degree of separation in said ontology drawn from a set of all the attributes of said data set; and wherein said data quality rules discovery engine refines said set of candidate conditional functional dependencies iteratively if they do not meet a predetermined expectation when applied to said data set, and terminates said refining when said set of conditional functional dependencies reach a quiescent state and become said data quality rules. - View Dependent Claims (17, 18, 19, 20)
-
Specification