INTEGRATING OBJECT-BASED DATA INTEGRATION TOOL WITH A VERSION CONTROL SYSTEM IN CENTRALIZED AND DECENTRALIZED ENVIRONMENTS
First Claim
Patent Images
1. A system, comprising:
- a distributed data integration tool executing across a plurality of distributed development environments, wherein each of the development environments is accessible to a different set of client devices, wherein each set of client devices is configured to define one or more data integration processes, wherein a data integration process defines one or more transforms to be performed on one or more data sources and defines one or more target data stores to which the transformed data is loaded;
a distributed version control system, wherein each client device is in communication with a local instance of the distributed version control system, and wherein each instance of the distributed version control system maintains one or more objects identified by a client device in the data integration process for version control, wherein the one or more objects define the one or more transforms;
wherein, when the data integration process is saved, the one or more objects are serialized to a file system in one of the distributed development environments and uploaded from that distributed development environment to the local instance of the distributed version control system in communication with that distributed development environment; and
wherein, when the data integration process is complete, the one or more objects stored in each instance of the distributed version control system are merged, wherein merging comprises;
generating a merge table, wherein the merge table maintains metadata information for the merge operation;
adding information for each of the one or more objects to the merge table;
identifying one or more merge conflicts associated with the one or more objects;
resolving the one or more merge conflicts;
updating a status of the merge table as each merge conflict is resolved; and
updating the status of the merge table after all merge conflicts are resolved.
1 Assignment
0 Petitions
Accused Products
Abstract
The present disclosure relates generally to a data integration system that integrate an object based data integration tool, such a GUI-based data integration tools, with version control systems using a relational database repository for persistence. Examples of distributed version control systems include Git, Mercurial, and Bazaar, and examples of centralized version control systems include Subversion, CVS etc. in centralized or distributed environments.
-
Citations
8 Claims
-
1. A system, comprising:
-
a distributed data integration tool executing across a plurality of distributed development environments, wherein each of the development environments is accessible to a different set of client devices, wherein each set of client devices is configured to define one or more data integration processes, wherein a data integration process defines one or more transforms to be performed on one or more data sources and defines one or more target data stores to which the transformed data is loaded; a distributed version control system, wherein each client device is in communication with a local instance of the distributed version control system, and wherein each instance of the distributed version control system maintains one or more objects identified by a client device in the data integration process for version control, wherein the one or more objects define the one or more transforms; wherein, when the data integration process is saved, the one or more objects are serialized to a file system in one of the distributed development environments and uploaded from that distributed development environment to the local instance of the distributed version control system in communication with that distributed development environment; and wherein, when the data integration process is complete, the one or more objects stored in each instance of the distributed version control system are merged, wherein merging comprises; generating a merge table, wherein the merge table maintains metadata information for the merge operation; adding information for each of the one or more objects to the merge table; identifying one or more merge conflicts associated with the one or more objects; resolving the one or more merge conflicts; updating a status of the merge table as each merge conflict is resolved; and updating the status of the merge table after all merge conflicts are resolved.
-
-
2. A method comprising:
-
defining a data integration process using a graphical data integration tool executing across a plurality of distributed development environments, wherein the data integration process defines one or more transforms to be performed on one or more data sources and defines one or more target data stores to which the transformed data is loaded; identifying one or more objects in the data integration process for version control, wherein the one or more objects define the one or more transforms; saving the data integration process, wherein saving includes serializing the one or more objects to a local file system and uploading the one or more serialized objects to a version control system.
-
-
3. A method comprising:
-
receiving, from a data integration client device, a selection of an object from a data integration process to be added to a distributed version control system, wherein each client device is in communication with a local instance of the distributed version control system, and wherein each instance of the distributed version control system maintains one or more objects identified by a client device in the data integration process for version control, wherein the data integration process defines one or more transforms to be performed on one or more data sources and defines one or more target data stores to which the transformed data is loaded, wherein one or more objects define the one or more transforms; identifying one or more parent objects of the selected object; determining if the one or more parent objects of the selected object are version controlled; adding one or more artifacts associated with the selected object to a local data store at the data integration client; exporting the one or more artifacts from the local data store to the remote centralized version control system; maintaining version metadata information of the selected object in the data store at the data integration client; and deleting the one or more artifacts from the local data store.
-
-
4. A method, comprising:
-
receiving, from a data integration client, a selection of an object from a data integration process to be added to a distributed version control system, wherein each client device is in communication with a local instance of the distributed version control system, and wherein each instance of the distributed version control system maintains one or more objects identified by a client device in the data integration process for version control, wherein the data integration process defines one or more transforms to be performed on one or more data sources and defines one or more target data stores to which the transformed data is loaded; identifying one or more child objects of the selected object; receiving a selection of the one or more child objects to be version controlled; exporting artifacts associated with the select object and the selected one or more child objects to local storage; adding the artifacts to a local version control system data store; synchronizing the artifacts stored in the local version control system data store with a distributed version control system data store; maintaining version metadata information of the selected object and selected one or more child objects in the data store at the data integration client device; and deleting the one or more artifacts from the local storage.
-
-
5. A method comprising:
-
restoring a data integration repository from a backup; configuring the restored data integration repository based on a previously configured trunk/branch data; removing entries from a version table; importing artifacts from a branch of a distributed version control system data store to a local directory; importing the artifacts as objects in a data store associated with a data integration tool; and persisting version metadata information of the imported data integration objects in the version table.
-
-
6. A method of synchronizing a data integration tool with a distributed version control system, the method comprising:
-
locking a data store associated with a data integration tool; disabling version management operations provided by the distributed version control system; identifying a plurality of version controlled container objects in the data store associated with the data integration tool, and for each version controlled container object in the plurality of version controlled container objects identifying version controlled child objects which are updated in the data store of the data integration tool; synchronizing the updated version controlled child objects with a remote centralized version control system repository; and updating version information for each of the updated version controlled child objects maintained in a version table; identifying a plurality of non-version controlled objects in the data store associated with the data integration tool, and for each non-version controlled object in the plurality of non-version controlled objects adding the non-versioned objects to a local version control system repository; synchronizing the non-versioned objects with the remote centralized version control system repository; and adding version information of the added non-versioned objects to the remote centralized version control system repository in the version table. - View Dependent Claims (7)
-
-
8. A system, comprising:
-
a data integration tool, executing in a centralized development environment, wherein a plurality of client devices access the data integration tool to define one or more data integration processes, wherein a data integration process defines one or more transforms to be performed on one or more data sources and defines one or more target data stores to which the transformed data is loaded; a distributed version control system, wherein each client device is in communication with an instance of the distributed version control system, and wherein each instance of the distributed version control system maintains one or more objects identified by a client device in the data integration process for version control, wherein the one or more objects define the one or more transforms; wherein, when a client device sends a request to save the data integration process is saved, the one or more objects are serialized to a file system in the centralized development environment and uploaded from the centralized development environment to an instance of the version control system in communication with the client device; and wherein, when the data integration process is complete, the one or more objects stored in each instance of the distributed version control system are merged, wherein merging comprises; generating a merge table, wherein the merge table maintains metadata information for the merge operation; adding information for each of the one or more objects to the merge table; identifying one or more merge conflicts associated with the one or more objects; resolving the one or more merge conflicts; updating a status of the merge table as each merge conflict is resolved; and updating the status of the merge table after all merge conflicts are resolved.
-
Specification