Adaptive data cleaning
First Claim
1. A data cleaning process, comprising the steps of:
- validating data loaded from at least two source systems using data formatting utilities and data cleaning utilities;
appending said validated data to a normalized data cleaning repository;
selecting the priority of said source systems;
creating a clean database containing unique data identifiers for each data element from said at least two source systems;
creating and maintaining a cross-reference between said unique data identifiers;
loading consistent, normalized, and cleansed data from said clean database into a format required by data systems and software tools using said data;
creating standardized data cleaning and management reports using said consistent, normalized, and cleansed data; and
updating said consistent, normalized, and cleansed data by a user without updating said source systems.
1 Assignment
0 Petitions
Accused Products
Abstract
A data cleaning process includes the steps of: validating data loaded from at least two source systems; appending the validated data to a normalized data cleaning repository; selecting the priority of the source systems; creating a clean database; loading the consistent, normalized, and cleansed data from the clean database into a format required by data systems and software tools using the data; creating reports; and updating the clean database by a user without updating the source systems. The data cleaning process standardizes the process of collecting and analyzing data from disparate sources for optimization models enabling consistent analysis. The data cleaning process further provides complete auditablility to the inputs and outputs of data systems and software tools that use a dynamic data set. The data cleaning process is suitable for, but not limited to, applications in aircraft industry, both military and commercial, for example for supply chain management.
-
Citations
24 Claims
-
1. A data cleaning process, comprising the steps of:
-
validating data loaded from at least two source systems using data formatting utilities and data cleaning utilities;
appending said validated data to a normalized data cleaning repository;
selecting the priority of said source systems;
creating a clean database containing unique data identifiers for each data element from said at least two source systems;
creating and maintaining a cross-reference between said unique data identifiers;
loading consistent, normalized, and cleansed data from said clean database into a format required by data systems and software tools using said data;
creating standardized data cleaning and management reports using said consistent, normalized, and cleansed data; and
updating said consistent, normalized, and cleansed data by a user without updating said source systems. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A data cleaning process for a supply chain, comprising the steps of:
-
loading data from multiple source systems to a master table of data elements and sources;
selecting precedence of said source systems;
cleaning logistics data contained in said master table of data elements and sources based on high driver and error reports;
approving consistent, normalized, and cleansed data of said master table of data elements and sources and providing said cleansed data to data systems and software tools using said data;
initiating inventory optimization of stock level and reorder points using a strategic inventory optimization model using said cleansed data;
providing a spares analysis including stock level and reorder point recommendations;
archiving supporting data for customer audit trail;
creating reports; and
purchasing spares to cover shortfalls according to said reports. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A data cleaning system, comprising:
-
data formatting utilities, wherein said data formatting utilities are used to validate data downloaded from at least two source systems;
data cleaning utilities, wherein said data cleaning utilities are used to clean said data;
a normalized data cleaning repository, wherein said normalized data cleaning repository receives said formatted and cleansed data;
source prioritization utilities, wherein said source prioritization utilities are used to select the priority of said at least two source systems;
a clean database, wherein said clean database combines said cleansed and prioritized data, and wherein said clean database is a single source of item data containing the best value and unique data identifiers for each data element;
cross-reference utilities, wherein said cross-reference utilities are used to create and maintain a cross-reference between said unique data identifiers; and
a data cleaning user interface, wherein said data cleaning user interface enables a user to update said clean data base. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
Specification