Partitioning target data to improve data replication performance

US 10,671,565 B2
Filed: 04/24/2015
Issued: 06/02/2020
Est. Priority Date: 04/24/2015
Status: Active Grant

First Claim

Patent Images

1. A method for organizing replicated data stored in a target data repository, the method comprising:

receiving data for storage in the target data repository from a primary data repository, wherein the received data is associated with a data structure, wherein the primary data repository includes a relational database, wherein the target data repository includes a write-once file system;

separating the received data into a first data set and a second data set according to one or more priority metrics, wherein the first data set is assigned a first priority level and the second data set is assigned a second priority level;

splitting the first data set into a plurality of data files;

separating the plurality of data files into a first data file and a second data file, wherein the first data file is assigned a higher probability of changing with respect to the first data set, than the second data file;

storing the received data based on the priority metrics into different data files that include the first data file of the plurality of data files;

identifying that data associated with the first data set has changed;

updating the first data set with the changed data, wherein the updating of the first data set includes re-writing at least a portion of the first data file; and

rebuilding the data associated with the data structure by;

combining the first data set with the second data set; and

writing the data associated with the data structure as a new file in the write-once file system.

View all claims

22 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The presently claimed invention relates to a system and method for organizing data replicated in a target data repository. The method of the presently claimed invention may receive data from a primary data store for replication in the target data repository. The method may then determine that the received data should be organized and stored according to one or more priority metrics. The method may then organize the received data according to the one or more priority metrics, and store the received data based on the priority metrics. Higher priority data may be stored faster data storage devices or be stored in smaller files where lower priority data may be stored in slower data storage devices or be stored in larger files.

Citations

20 Claims

1. A method for organizing replicated data stored in a target data repository, the method comprising:
- receiving data for storage in the target data repository from a primary data repository, wherein the received data is associated with a data structure, wherein the primary data repository includes a relational database, wherein the target data repository includes a write-once file system;
  
  separating the received data into a first data set and a second data set according to one or more priority metrics, wherein the first data set is assigned a first priority level and the second data set is assigned a second priority level;
  
  splitting the first data set into a plurality of data files;
  
  separating the plurality of data files into a first data file and a second data file, wherein the first data file is assigned a higher probability of changing with respect to the first data set, than the second data file;
  
  storing the received data based on the priority metrics into different data files that include the first data file of the plurality of data files;
  
  identifying that data associated with the first data set has changed;
  
  updating the first data set with the changed data, wherein the updating of the first data set includes re-writing at least a portion of the first data file; and
  
  rebuilding the data associated with the data structure by;
  
  combining the first data set with the second data set; and
  
  writing the data associated with the data structure as a new file in the write-once file system.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the first data set is stored on a first type of data storage device, and the data structure is a table, wherein the changed data is associated with information stored in at least one row of the table.
  - 3. The method of claim 2, wherein the second data set is stored on a second type of data storage device.
  - 4. The method of claim 1, wherein the first data set is organized in the first data file and the second data set is organized in a second data file, wherein the first file is split into the plurality of data files, wherein at least one of the first data file or each data file of the plurality of data files are smaller than the second data file.
  - 5. The method of claim 1, wherein the primary data storage repository is a relational database management system (RDBMS) and the target data storage repository includes a Hadoop distributed file system (HDFS).
  - 6. The method of claim 1, wherein the primary and the target data storage repositories are both relational database management systems (RDBMS).
  - 7. The method of claim 1, further comprising:
    - receiving data for storage in the target data repository from a primary data repository;
      
      identifying that the received data should be re-organized according to one or more priority metrics;
      
      re-organizing the received data according to the one or more priority metrics; and
      
      storing the received data based on the priority metrics.
  - 8. The method of claim 1, further comprising:
    - receiving data for storage in the target data repository from a primary data repository; and
      
      storing the received data based on current priority metrics.

9. A non-transitory computer readable storage medium having embodied thereon a program executable by a processor for organizing replicated data stored in a target data repository, the method comprising:
- receive data for storage in the target data repository from a primary data repository, wherein the received data is associated with a data structure, wherein the primary data repository includes a relational database, wherein the target data repository includes a write-once file system;
  
  separate the received data into a first data set and a second set according to one or more priority metrics, wherein the first data set is assigned a first priority level and the second data set is assigned a second priority level;
  
  split the first data set into a plurality of data files;
  
  separate the plurality of data files into a first data file and a second data file, wherein the first data file is assigned a higher probability of changing with respect to the first data set, than the second data file;
  
  store the received data based on the priority metrics into different data files that include the first data file;
  
  identify that data associated with the first data set has changed;
  
  updating the first data set with the changed data, wherein the updating of the first data set includes re-writing at least a portion of the first data file; and
  
  rebuild the data associated with the data structure by;
  
  combining the first data set with the second data set; and
  
  writing the data associated with the data structure as a new file in the write-once file system.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The non-transitory computer readable storage medium of claim 9, wherein the first data set is stored on a first type of data storage device, and the data structure is a table, wherein the changed data is associated with information stored in at least one row of the table.
  - 11. The non-transitory computer readable storage medium of claim 10, wherein the second data set is stored on a second type of data storage device.
  - 12. The non-transitory computer readable storage medium of claim 9, wherein the first data set is organized in the first data file and the second data set is organized in a second data file, wherein the first file is split into the plurality of data files, wherein at least one of the first file or each data file of the plurality of data files are smaller than the second data file.
  - 13. The non-transitory computer readable storage medium of claim 9, wherein the primary data storage repository is a relational database management system (RDBMS) and the target data storage repository includes a Hadoop distributed file system (HDFS).
  - 14. The non-transitory computer readable storage medium of claim 9, wherein the primary and the target data storage repositories are both relational database management systems (RDBMS).
  - 15. The non-transitory computer readable storage medium of claim 9, the program further executable to:
    - receive data for storage in the target data repository from a primary data repositoryidentify that the received data should be re-organized according to one or more priority metrics;
      
      re-organize the received data according to the one or more priority metrics; and
      
      store the received data based on the priority metrics.
  - 16. The non-transitory computer readable storage medium of claim 9, the program further executable to:
    - receive data for storage in the target data repository from a primary data repository; and
      
      store the received data based on current priority metrics.

17. An apparatus for organizing replicated data stored in a target data repository, the apparatus comprising:
- a memory;
  
  a data communication interface that receives data for storage in the target data repository from a primary data repository, wherein the received data is associated with a data structure, wherein the primary data repository includes a relational database, wherein the target data repository includes a write-once file system; and
  
  a processor that executes instructions stored in memory to;
  
  separate the received data into a first data set and a second data set according to one or more priority metrics, wherein the first data set is assigned a first priority level and the second data set is assigned a second priority level;
  
  split the first data set into a plurality of data files;
  
  separate the plurality of data files into a first data file and a second data file, wherein the first data file is assigned a higher probability of changing with respect to the first data set, than the second data file;
  
  organize the received data for storage in the target data repository based on the priority metrics into different data files that include the first data file;
  
  identify that data associated with the first data set has changed;
  
  update the first data set with the changed data, wherein the updating of the first data set includes re-writing at least a portion of the first data file; and
  
  rebuild the data associated with the data structure by;
  
  combining the first data set with the second data set; and
  
  writing the data associated with the data structure as a new file in the write-once file system.
- View Dependent Claims (18, 19, 20)
- - 18. The apparatus of claim 17, wherein the first data set is stored on a first type of data storage device, and the data structure is a table, wherein the changed data is associated with information stored in at least one row of the table.
  - 19. The apparatus of claim 18, wherein the second data set is stored on a second type of data storage device.
  - 20. The apparatus of claim 17, wherein the first data set is organized in the first data file and the second data set is organized in a second data file, wherein the first file is split into the plurality of data files, wherein at least one of the first data file or each data file of the plurality of data files are smaller than the second data file.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Quest Software, Inc.
Original Assignee
Quest Software, Inc.
Inventors
Romine, William James
Primary Examiner(s)
Hu, Jensen

Application Number

US14/695,681
Publication Number

US 20160314147A1
Time in Patent Office

1,866 Days
Field of Search

707740
US Class Current
CPC Class Codes

G06F 16/113   Details of archiving lifecy...

G06F 16/214   Database migration support

G06F 16/22   Indexing; Data structures t...

Partitioning target data to improve data replication performance

First Claim

22 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Partitioning target data to improve data replication performance

First Claim

22 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links