Method and apparatus for data integration
First Claim
Patent Images
1. A computer-implemented method for collecting data from among a plurality of data sites to be stored at a central site, each data site having an associated data store, the method comprising:
- providing each data site with a corresponding extraction routine;
for each data site, processing data contained in its associated data store in accordance with its corresponding extraction routine to produce first data, the corresponding extraction routine configured to store the first data in a storage location at the data site if the processing produces first data;
collecting second data which is based on first data obtained from those data sites for which their corresponding extraction routines produced said first data; and
loading all of the second data into a database,wherein the step of collecting includes communicating the first data to a central site in accordance with a remote copy operation, receiving at the central site the first data as mirrored data, and transforming the mirrored data to produce the second data, the second data being stored at the central site,wherein the step of providing a corresponding extraction routine comprises, for each data site;
receiving a specification information which is descriptive of data stored in a data store associated with the data site;
producing the extraction routine based on the specification information; and
communicating the extraction routine to the data site.
3 Assignments
0 Petitions
Accused Products
Abstract
Aspects of the present invention provide integration of geographically distributed data. The data can be integrated in a single database. An illustrative embodiment of the invention comprises a tight combination between conventional ETL (extraction, translation, and loading) tools and conventional remote copy functionality used for data backup and recovery.
-
Citations
14 Claims
-
1. A computer-implemented method for collecting data from among a plurality of data sites to be stored at a central site, each data site having an associated data store, the method comprising:
-
providing each data site with a corresponding extraction routine; for each data site, processing data contained in its associated data store in accordance with its corresponding extraction routine to produce first data, the corresponding extraction routine configured to store the first data in a storage location at the data site if the processing produces first data; collecting second data which is based on first data obtained from those data sites for which their corresponding extraction routines produced said first data; and loading all of the second data into a database, wherein the step of collecting includes communicating the first data to a central site in accordance with a remote copy operation, receiving at the central site the first data as mirrored data, and transforming the mirrored data to produce the second data, the second data being stored at the central site, wherein the step of providing a corresponding extraction routine comprises, for each data site; receiving a specification information which is descriptive of data stored in a data store associated with the data site; producing the extraction routine based on the specification information; and communicating the extraction routine to the data site. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-implemented method for collecting data from among a plurality of data sites to be stored at a central site, each data site having an associated data store, the method comprising:
-
providing each data site with a corresponding extraction routine; for each data site, processing data contained in its associated data store in accordance with its corresponding extraction routine to produce first data, the corresponding extraction routine configured to store the first data in a storage location at the data site if the processing produces first data; collecting second data which is based on first data obtained from those data sites for which their corresponding extraction routines produced said first data, the second data being based on the first data; and loading all of the second data into a database, wherein the step of collecting includes communicating the first data to a central site in accordance with a remote copy operation, receiving at the central site the first data as mirrored data, and transforming the mirrored data to produce the second data, the second data being stored at the central site, wherein the step of providing a corresponding extraction routine comprises, for each data site; receiving a first specification information which is descriptive of data stored in its associated data store; producing a second specification information based on the first specification information which is descriptive of the extraction routine; communicating the second specification information to the data site; and
producing, at the data site, the extraction routine based on the second specification information. - View Dependent Claims (9)
-
-
10. A data collection system comprising:
-
a central data site comprising at least one host processor; a host storage system operatively coupled to the at least one host processor; at least one remote data site comprising at least one remote processor; and a remote storage system operatively coupled to the at least one remote processor, the at least one host processor having program generating code configured to; obtain storage related parameters from the at least one remote data site; generate first interim volume managing code; generate second interim volume managing code; generate a data extraction routine, including receiving a specification information based on the storage related parameters and producing the extraction routine based on the specification information; generate remote copy control code based on the storage related parameters; and transfer the second interim volume managing code, the data extraction routine, and the remote copy control code to the at least one remote data site, the at least one host processor further having program manager code configured to initiate processing of the second interim volume managing code, the data extraction routine, and the remote copy control code, the data extraction routine configured to produce extracted data, the remote copy control code configured to perform a data duplication operation of the extracted data, wherein the central data site serves as the duplication site for the extracted data, the first interim volume managing code configured to allocate a host interim volume in the host storage system for storing at least some of the extracted data, the second interim volume managing code configured to allocate a remote interim volume in the remote storage system for storing the extracted data. - View Dependent Claims (11, 12, 13, 14)
-
Specification