Partitioning target data to improve data replication performance
First Claim
1. A method for organizing replicated data stored in a target data repository, the method comprising:
- receiving data for storage in the target data repository from a primary data repository, wherein the received data is associated with a data structure, wherein the primary data repository includes a relational database, wherein the target data repository includes a write-once file system;
separating the received data into a first data set and a second data set according to one or more priority metrics, wherein the first data set is assigned a first priority level and the second data set is assigned a second priority level;
splitting the first data set into a plurality of data files;
separating the plurality of data files into a first data file and a second data file, wherein the first data file is assigned a higher probability of changing with respect to the first data set, than the second data file;
storing the received data based on the priority metrics into different data files that include the first data file of the plurality of data files;
identifying that data associated with the first data set has changed;
updating the first data set with the changed data, wherein the updating of the first data set includes re-writing at least a portion of the first data file; and
rebuilding the data associated with the data structure by;
combining the first data set with the second data set; and
writing the data associated with the data structure as a new file in the write-once file system.
22 Assignments
0 Petitions
Accused Products
Abstract
The presently claimed invention relates to a system and method for organizing data replicated in a target data repository. The method of the presently claimed invention may receive data from a primary data store for replication in the target data repository. The method may then determine that the received data should be organized and stored according to one or more priority metrics. The method may then organize the received data according to the one or more priority metrics, and store the received data based on the priority metrics. Higher priority data may be stored faster data storage devices or be stored in smaller files where lower priority data may be stored in slower data storage devices or be stored in larger files.
-
Citations
20 Claims
-
1. A method for organizing replicated data stored in a target data repository, the method comprising:
-
receiving data for storage in the target data repository from a primary data repository, wherein the received data is associated with a data structure, wherein the primary data repository includes a relational database, wherein the target data repository includes a write-once file system; separating the received data into a first data set and a second data set according to one or more priority metrics, wherein the first data set is assigned a first priority level and the second data set is assigned a second priority level; splitting the first data set into a plurality of data files; separating the plurality of data files into a first data file and a second data file, wherein the first data file is assigned a higher probability of changing with respect to the first data set, than the second data file; storing the received data based on the priority metrics into different data files that include the first data file of the plurality of data files; identifying that data associated with the first data set has changed; updating the first data set with the changed data, wherein the updating of the first data set includes re-writing at least a portion of the first data file; and rebuilding the data associated with the data structure by; combining the first data set with the second data set; and writing the data associated with the data structure as a new file in the write-once file system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A non-transitory computer readable storage medium having embodied thereon a program executable by a processor for organizing replicated data stored in a target data repository, the method comprising:
-
receive data for storage in the target data repository from a primary data repository, wherein the received data is associated with a data structure, wherein the primary data repository includes a relational database, wherein the target data repository includes a write-once file system; separate the received data into a first data set and a second set according to one or more priority metrics, wherein the first data set is assigned a first priority level and the second data set is assigned a second priority level; split the first data set into a plurality of data files; separate the plurality of data files into a first data file and a second data file, wherein the first data file is assigned a higher probability of changing with respect to the first data set, than the second data file; store the received data based on the priority metrics into different data files that include the first data file; identify that data associated with the first data set has changed; updating the first data set with the changed data, wherein the updating of the first data set includes re-writing at least a portion of the first data file; and rebuild the data associated with the data structure by; combining the first data set with the second data set; and writing the data associated with the data structure as a new file in the write-once file system. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. An apparatus for organizing replicated data stored in a target data repository, the apparatus comprising:
-
a memory; a data communication interface that receives data for storage in the target data repository from a primary data repository, wherein the received data is associated with a data structure, wherein the primary data repository includes a relational database, wherein the target data repository includes a write-once file system; and a processor that executes instructions stored in memory to; separate the received data into a first data set and a second data set according to one or more priority metrics, wherein the first data set is assigned a first priority level and the second data set is assigned a second priority level; split the first data set into a plurality of data files; separate the plurality of data files into a first data file and a second data file, wherein the first data file is assigned a higher probability of changing with respect to the first data set, than the second data file; organize the received data for storage in the target data repository based on the priority metrics into different data files that include the first data file; identify that data associated with the first data set has changed; update the first data set with the changed data, wherein the updating of the first data set includes re-writing at least a portion of the first data file; and rebuild the data associated with the data structure by; combining the first data set with the second data set; and writing the data associated with the data structure as a new file in the write-once file system. - View Dependent Claims (18, 19, 20)
-
Specification