Failure recovery and error correction techniques for data loading in information warehouses
First Claim
1. A method comprising:
- maintaining a source file version table including;
source data file information for data loading of a plurality of source data files, wherein the source data files are .xml files;
current version information for each of the source data files; and
maximum version information for each of the source data files with respect to maximum version numbers used in the past;
maintaining a plurality of individual data base (DB) tables each of which includes creation version information for each record of each source data file;
maintaining a system table containing system information of the DB tables, including pending status and creation time of the DB tables;
performing update, undo, and redo operations of data loading using the source file version table and the DB tables, wherein the data loading includes splitting each of the source data files into multiple blocks, and loading the multiple blocks in a bulk loading operation;
tracking a tuple count and match the tuple count between the DB tables;
matching a plurality of keys between the DB tables;
aborting the data loading after a partial completion of the data loading, and resuming the data loading without restarting to a beginning of the data loading;
tracking incomplete records in the information warehouse using a state transition diagram which diagrams load progress states of the multiple blocks in the information warehouse;
removing all the tracked incomplete records in the information warehouse;
determining whether a modification has been made to one of the source data files; and
deleting a non-current version of one of the DB tables after the undo operation in response to determining that a modification was made to said one of the source data files.
0 Assignments
0 Petitions
Accused Products
Abstract
A method of data loading for large information warehouses includes performing checkpointing concurrently with data loading into an information warehouse, the checkpointing ensuring consistency among multiple tables; and recovering from a failure in the data loading using the checkpointing. A method is also disclosed for performing versioning concurrently with data loading into an information warehouse. The versioning method enables processing undo and redo operations of the data loading between a later version and a previous version. Data load failure recovery is performed without starting a data load from the beginning but rather from a latest checkpoint for data loading at an information warehouse level using a checkpoint process characterized by a state transition diagram having a multiplicity of states; and tracking state transitions among the states using a system state table.
-
Citations
17 Claims
-
1. A method comprising:
-
maintaining a source file version table including; source data file information for data loading of a plurality of source data files, wherein the source data files are .xml files; current version information for each of the source data files; and maximum version information for each of the source data files with respect to maximum version numbers used in the past; maintaining a plurality of individual data base (DB) tables each of which includes creation version information for each record of each source data file; maintaining a system table containing system information of the DB tables, including pending status and creation time of the DB tables; performing update, undo, and redo operations of data loading using the source file version table and the DB tables, wherein the data loading includes splitting each of the source data files into multiple blocks, and loading the multiple blocks in a bulk loading operation; tracking a tuple count and match the tuple count between the DB tables; matching a plurality of keys between the DB tables;
aborting the data loading after a partial completion of the data loading, and resuming the data loading without restarting to a beginning of the data loading;tracking incomplete records in the information warehouse using a state transition diagram which diagrams load progress states of the multiple blocks in the information warehouse; removing all the tracked incomplete records in the information warehouse; determining whether a modification has been made to one of the source data files; and deleting a non-current version of one of the DB tables after the undo operation in response to determining that a modification was made to said one of the source data files. - View Dependent Claims (2, 3, 4, 5)
-
-
6. An information warehouse system comprising:
-
a computer processor; a source file version table stored on the computer processor including; source data file information for data loading of a plurality of source data files, wherein the source data files are .xml files; current version information for each of the source data files; and maximum version information for each of the source data files with respect to maximum version numbers used in the past; a plurality of individual data base (DB) tables each of which includes creation version information for each record of each source data file; a system table containing system information of the DB tables, including pending status and creation time of the DB tables; wherein the computer processor is configured to; perform update, undo, and redo operations of data loading using the source file version table and the DB tables, wherein the data loading includes splitting each of the source data files into multiple blocks, and loading the multiple blocks in a bulk loading operation; track a tuple count and match the tuple count between the DB tables; match a plurality of keys between the DB tables;
abort the data loading after a partial completion of the data loading, and resume the data loading without restarting to a beginning of the data loading;track incomplete records in the information warehouse using a state transition diagram which diagrams load progress states of the multiple blocks in the information warehouse; remove all the tracked incomplete records in the information warehouse; determine whether a modification has been made to one of the source data files; and delete a non-current version of one of the DB tables after the undo operation in response to determining that a modification was made to said one of the source data files. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A computer program product stored on a non-transitory computer-readable medium for use with an information warehouse, wherein the computer program product when executed on a computer causes the computer to:
-
maintain a source file version table including; source data file information for data loading of a plurality of source data files, wherein the source data files are .xml files; current version information for each of the source data files; and maximum version information for each of the source data files with respect to maximum version numbers used in the past; maintain a plurality of individual data base (DB) tables each of which includes creation version information for each record of each source data file; maintain a system table containing system information of the DB tables, including pending status and creation time of the DB tables; perform update, undo, and redo operations of data loading using the source file version table and the DB tables, wherein the data loading includes splitting each of the source data files into multiple blocks, and loading the multiple blocks in a bulk loading operation; track a tuple count and match the tuple count between the DB tables; match a plurality of keys between the DB tables;
abort the data loading after a partial completion of the data loading, and resuming the data loading without restarting to a beginning of the data loading;track incomplete records in the information warehouse using a state transition diagram which diagrams load progress states of the multiple blocks in the information warehouse; remove all the tracked incomplete records in the information warehouse;
determine whether a modification has been made to one of the source data files; anddelete a non-current version of one of the DB tables after the undo operation in response to determining that a modification was made to said one of the source data files. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
Specification