USING A DATA MINING ALGORITHM TO GENERATE FORMAT RULES USED TO VALIDATE DATA SETS
First Claim
1. An article of manufacture having code for causing operations to be performed, the operations comprising:
- processing a data set having a plurality of columns and records providing data for each of the columns;
receiving selection of at least one format column for which format rules are to be generated;
receiving selection of at least one predictor column;
generating a format mask column for each selected format column;
for records in the data set, converting a value in the at least one format column to a format mask representing a format of the value in the format column and storing the format mask in the format mask column in the record for which the format mask was generated; and
processing the at least one predictor column and the at least one format mask column to generate at least one format rule, wherein each format rule specifies a format mask associated with at least one condition in the at least one predictor column.
1 Assignment
0 Petitions
Accused Products
Abstract
Provided are a method, system, and article of manufacture for using a data mining algorithm to generate format rules used to validate data sets. A data set has a plurality of columns and records providing data for each of the columns. Selection is received of at least one format column for which format rules are to be generated and selection is received of at least one predictor column. A format mask column is generated for each selected format column. For records in the data set, a value in the at least one format column is converted to a format mask representing a format of the value in the format column and storing the format mask in the format mask column in the record for which the format mask was generated. The at least one predictor column and the at least one format mask column are processed to generate at least one format rule. Each format rule specifies a format mask associated with at least one condition in the at least one predictor column.
145 Citations
30 Claims
-
1. An article of manufacture having code for causing operations to be performed, the operations comprising:
-
processing a data set having a plurality of columns and records providing data for each of the columns; receiving selection of at least one format column for which format rules are to be generated; receiving selection of at least one predictor column; generating a format mask column for each selected format column; for records in the data set, converting a value in the at least one format column to a format mask representing a format of the value in the format column and storing the format mask in the format mask column in the record for which the format mask was generated; and processing the at least one predictor column and the at least one format mask column to generate at least one format rule, wherein each format rule specifies a format mask associated with at least one condition in the at least one predictor column. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system in communication with a data source including a data set having a plurality of columns and records providing data for each of the columns, comprising:
-
a processor; a computer readable medium having a rule engine executed by the processor to perform operations, the operations comprising; processing a data set having a plurality of columns and records providing data for each of the columns; receiving selection of at least one format column for which format rules are to be generated; receiving selection of at least one predictor column; generating a format mask column for each selected format column; for records in the data set, converting a value in the at least one format column to a format mask representing a format of the value in the format column and storing the format mask in the format mask column in the record for which the format mask was generated; and processing the at least one predictor column and the at least one format mask column to generate at least one format rule, wherein each format rule specifies a format mask associated with at least one condition in the at least one predictor column. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A method, comprising:
-
processing a data set having a plurality of columns and records providing data for each of the columns; receiving selection of at least one format column for which format rules are to be generated; receiving selection of at least one predictor column; generating a format mask column for each selected format column; for records in the data set, converting a value in the at least one format column to a format mask representing a format of the value in the format column and storing the format mask in the format mask column in the record for which the format mask was generated; and processing the at least one predictor column and the at least one format mask column to generate at least one format rule, wherein each format rule specifies a format mask associated with at least one condition in the at least one predictor column. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30)
-
Specification