Methods and systems for assessing data quality
First Claim
1. A method, comprising:
- selecting, by a microprocessor, a group of proposed critical data elements from a plurality of proposed critical data elements consisting at least in part of type of account, original balance, origination date, number of deposits, and number of loans based at least in part on ranking each of the plurality of proposed critical data elements according to weighted criteria consisting at least in part of ease of access to each proposed critical data element, regulatory risk associated with each proposed critical data element, financial risk associated with each proposed critical data element, and reputation risk associated with each proposed critical data element;
collecting, by the microprocessor, samples of data for each of the proposed critical data elements in said group of proposed critical data elements from a database storing a population of data elements representing attributes of each of a plurality of different financial transactions;
identifying, by the microprocessor, a portion of said group of proposed critical data elements based at least in part on a ranking of respective degrees of correlation between said data samples for each of the proposed critical data elements in said group of proposed critical data elements;
generating, by the microprocessor, a plurality of different, overlapping sets of data quality rules at least in part in terms of data completeness and data validity for each of the proposed critical data elements in said portion of said group of proposed critical data elements, each set of data quality rules comprising a different number of data quality rules for the same proposed critical data elements in said portion of said group of proposed critical data elements;
identifying, by the microprocessor, one of the plurality of different, overlapping sets of data quality rules for monitoring a quality of data in said database based at least in part on a difference between a value for each of said sets of data quality rules as a function of accuracy or completeness of data in the database and a sum of a cost of creating each set of data quality rules as a function of number, complexity, and interdependency of rules in each of said sets of data quality rules;
monitoring, by the microprocessor, the quality of data within said database using said identified one of the plurality of different, overlapping sets of data quality rulesidentifying, by the microprocessor, critical data elements that produce a pre-defined high number of outliers in said data within said database based on said monitoring the quality of data in said database indicative of a likelihood that a process is out of control; and
identifying, by the microprocessor, causes for the pre-defined high number of outliers produced by said critical data elements in said data within said database.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods and systems for assessing data involve, collecting samples of data elements from a database storing a population of data elements representing attributes of each numerous different financial transactions. Critical data elements from the collected samples are determined. Data quality rules are built and data dimensions are calculated for the critical data elements. A quality of data within the critical data elements for different data quality dimensions is monitored. Critical data elements that produce a high number of outliers are identified and causes for the outliers are identified. Thereafter, a corrective action plan to address a solution for the causes for the outliers may be developed and executed.
24 Citations
22 Claims
-
1. A method, comprising:
-
selecting, by a microprocessor, a group of proposed critical data elements from a plurality of proposed critical data elements consisting at least in part of type of account, original balance, origination date, number of deposits, and number of loans based at least in part on ranking each of the plurality of proposed critical data elements according to weighted criteria consisting at least in part of ease of access to each proposed critical data element, regulatory risk associated with each proposed critical data element, financial risk associated with each proposed critical data element, and reputation risk associated with each proposed critical data element; collecting, by the microprocessor, samples of data for each of the proposed critical data elements in said group of proposed critical data elements from a database storing a population of data elements representing attributes of each of a plurality of different financial transactions; identifying, by the microprocessor, a portion of said group of proposed critical data elements based at least in part on a ranking of respective degrees of correlation between said data samples for each of the proposed critical data elements in said group of proposed critical data elements; generating, by the microprocessor, a plurality of different, overlapping sets of data quality rules at least in part in terms of data completeness and data validity for each of the proposed critical data elements in said portion of said group of proposed critical data elements, each set of data quality rules comprising a different number of data quality rules for the same proposed critical data elements in said portion of said group of proposed critical data elements; identifying, by the microprocessor, one of the plurality of different, overlapping sets of data quality rules for monitoring a quality of data in said database based at least in part on a difference between a value for each of said sets of data quality rules as a function of accuracy or completeness of data in the database and a sum of a cost of creating each set of data quality rules as a function of number, complexity, and interdependency of rules in each of said sets of data quality rules; monitoring, by the microprocessor, the quality of data within said database using said identified one of the plurality of different, overlapping sets of data quality rules identifying, by the microprocessor, critical data elements that produce a pre-defined high number of outliers in said data within said database based on said monitoring the quality of data in said database indicative of a likelihood that a process is out of control; and identifying, by the microprocessor, causes for the pre-defined high number of outliers produced by said critical data elements in said data within said database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system, comprising:
a microprocessor programmed to; select a group of proposed critical data elements from a plurality of proposed critical data elements consisting at least in part of type of account, original balance, origination date, number of deposits, and number of loans based at least in part on ranking each of the plurality of proposed critical data elements according to weighted criteria consisting at least in part of ease of access to each proposed critical data element, regulatory risk associated with each proposed critical data element, financial risk associated with each proposed critical data element, and reputation risk associated with each proposed critical data element; collect samples of data for each of the proposed critical data elements in said group of proposed critical data elements from a database storing a population of data elements representing attributes of each of a plurality of different financial transactions; identify a portion of said group of proposed critical data elements based at least in part on a ranking of respective degrees of correlation between said data samples for each of the proposed critical data elements in said group of proposed critical data elements; generate a plurality of different, overlapping sets of data quality rules at least in part in terms of data completeness and data validity for each of the proposed critical data elements in said portion of said group of proposed critical data elements, each set of data quality rules comprising a different number of data quality rules for the same proposed critical data elements in said portion of said group of proposed critical data elements; identify one of the plurality of different, overlapping sets of data quality for monitoring a quality of data in said database based at least in part on a difference between a value for each of said sets of data quality rules as a function of accuracy or completeness of data in the database and a sum of a cost of creating each set of data quality rules as a function of number, complexity, and interdependency of rules in each of said sets of data quality rules monitor the quality of data within said database using said identified one of the plurality of different, overlapping sets of data quality rules identifying critical data elements that produce a pre-defined high number of outliers in said data within said database based on said monitoring the quality of data in said database indicative of a likelihood that a process is out of control; and identifying causes for the pre-defined high number of outliers produced by said critical data elements in said data within said database.
-
22. A non-transitory computer-readable storage medium with an executable program for assessing data quality stored thereon, wherein the program instructs a microprocessor to perform the steps of:
-
selecting a group of proposed critical data elements from a plurality of proposed critical data elements consisting at least in part of type of account, original balance, origination date, number of deposits, and number of loans based at least in part on ranking each of the plurality of proposed critical data elements according to weighted criteria consisting at least in part of ease of access to each proposed critical data element, regulatory risk associated with each proposed critical data element, financial risk associated with each proposed critical data element, and reputation risk associated with each proposed critical data element; collecting samples of data for each of the proposed critical data elements in said group of proposed critical data elements from a database storing a population of data elements representing attributes of each of a plurality of different financial transactions; identifying a portion of said group of proposed critical data elements based at least in part on a ranking of respective degrees of correlation between said data samples for each of the proposed critical data elements in said group of proposed critical data elements; generating a plurality of different, overlapping sets of data quality rules at least in part in terms of data completeness and data validity for each of the proposed critical data elements in said portion of said group of proposed critical data elements, each set of data quality rules comprising a different number of data quality rules for the same proposed critical data elements in said portion of said group of proposed critical data elements; identifying one of the plurality of different, overlapping sets of data quality rules for monitoring a quality of data in said database based at least in part on a difference between a value for each of said sets of data quality rules as a function of accuracy or completeness of data in the database and a sum of a cost of creating each set of data quality rules as a function of number, complexity, and interdependency of rules in each of said sets of data quality rules; monitoring the quality of data within said database using said identified one of the plurality of different, overlapping sets of data quality rules identifying critical data elements that produce a pre-defined high number of outliers in said data within said database based on said monitoring the quality of data in said database indicative of a likelihood that a process is out of control; and identifying causes for the pre-defined high number of outliers produced by said critical data elements in said data within said database.
-
Specification