Method and apparatus for loading data files into a data-warehouse system
First Claim
1. A method for deciding when to overwrite an existing record in a in a data-warehouse system for which there is to be data loaded or restored for use, comprising the steps of:
- defining a relative-ordering index field for each type of record to be written to said data-warehouse system;
defining comparison rules between the relative-ordering index fields of corresponding records to be compared to define which of the records being compared should be favored over the other in order to be written to said data-warehouse system;
selecting at least one data source for loading into said data-warehouse system;
querying said at least one data source to stage a record for potential loading in said data-warehouse system, wherein said staged record has data to extract for use in comparing relative-ordering indexes with any existing corresponding record in said data-warehouse system;
if a corresponding record to said staged record does not already exist in said data-warehouse system, then writing said staged record to said data-warehouse system;
if a corresponding record to said staged record already exists in said data-warehouse system, then comparing the relative-ordering index field for said corresponding record to the relative-ordering index field for said staged record;
if said comparison rules favor said staged record over said corresponding record, then writing said staged record to said data-warehouse system;
if said comparison rules favor said corresponding record over said staged record, then not overwriting said corresponding record.
3 Assignments
0 Petitions
Accused Products
Abstract
Date-warehouse systems are populated using an enhanced Extraction-Load-Transform (ETL) process and system by employing three ideas: Out-of-order-fill ETL, relative-ordering index (ROI), and dependent queries. Out-of-order-fill ETL allows a data warehouse to accept the loading of data files in any order, and does not require the loading of any previous backup data files in order to provide some functionality to end users under the view that some functionality or data access is better than none at all. Dependent queries are processes that use defined data structures for use in constructing, extracting, and validating each record to be written in said data-warehouse system in order to ensure that referential integrity is maintained and that no orphaned data is pushed into the data warehouse. Finally, ROI is a process wherein a value is determined, based on the constraints of the source data, which indicates the relative newness of the data.
22 Citations
4 Claims
-
1. A method for deciding when to overwrite an existing record in a in a data-warehouse system for which there is to be data loaded or restored for use, comprising the steps of:
-
defining a relative-ordering index field for each type of record to be written to said data-warehouse system; defining comparison rules between the relative-ordering index fields of corresponding records to be compared to define which of the records being compared should be favored over the other in order to be written to said data-warehouse system; selecting at least one data source for loading into said data-warehouse system; querying said at least one data source to stage a record for potential loading in said data-warehouse system, wherein said staged record has data to extract for use in comparing relative-ordering indexes with any existing corresponding record in said data-warehouse system; if a corresponding record to said staged record does not already exist in said data-warehouse system, then writing said staged record to said data-warehouse system; if a corresponding record to said staged record already exists in said data-warehouse system, then comparing the relative-ordering index field for said corresponding record to the relative-ordering index field for said staged record; if said comparison rules favor said staged record over said corresponding record, then writing said staged record to said data-warehouse system; if said comparison rules favor said corresponding record over said staged record, then not overwriting said corresponding record. - View Dependent Claims (2, 3, 4)
-
Specification