Optimizing data processing across server clusters and data centers using checkpoint-based data replication
First Claim
1. A computing platform, comprising:
- at least one processor;
a communication interface communicatively coupled to the at least one processor; and
memory storing computer-readable instructions that, when executed by the at least one processor, cause the computing platform to;
determine to initiate a data processing job associated with identifying one or more features of a source dataset, the data processing job comprising multiple processing steps;
based on determining to initiate the data processing job, generate one or more first commands directing one or more first cluster server nodes associated with a first data center to execute the multiple processing steps associated with the data processing job to identify the one or more features of the source dataset, the one or more first commands further directing the one or more first cluster server nodes associated with the first data center to update a checkpoint table as each processing step of the multiple processing steps associated with the data processing job is completed, and the one or more first commands further directing the one or more first cluster server nodes associated with the first data center to replicate processing results data to at least one other data center different from the first data center as each processing step of the multiple processing steps associated with the data processing job is completed; and
send, via the communication interface, to the one or more first cluster server nodes associated with the first data center, the one or more first commands.
1 Assignment
0 Petitions
Accused Products
Abstract
Aspects of the disclosure relate to optimizing data processing across server clusters and data centers using checkpoint-based data replication. A computing platform may determine to initiate a data processing job associated with identifying one or more features of a source dataset, and the data processing job may include multiple processing steps. Based on determining to initiate the data processing job, the computing platform may generate one or more commands directing one or more cluster server nodes associated with a data center to execute the multiple processing steps. The one or more commands may direct the one or more cluster server nodes to update a checkpoint table as each processing step is completed, and may further direct the one or more cluster server nodes to replicate processing results data to at least one other data center. Subsequently, the computing platform may send the generated commands to the cluster server nodes.
-
Citations
20 Claims
-
1. A computing platform, comprising:
-
at least one processor; a communication interface communicatively coupled to the at least one processor; and memory storing computer-readable instructions that, when executed by the at least one processor, cause the computing platform to; determine to initiate a data processing job associated with identifying one or more features of a source dataset, the data processing job comprising multiple processing steps; based on determining to initiate the data processing job, generate one or more first commands directing one or more first cluster server nodes associated with a first data center to execute the multiple processing steps associated with the data processing job to identify the one or more features of the source dataset, the one or more first commands further directing the one or more first cluster server nodes associated with the first data center to update a checkpoint table as each processing step of the multiple processing steps associated with the data processing job is completed, and the one or more first commands further directing the one or more first cluster server nodes associated with the first data center to replicate processing results data to at least one other data center different from the first data center as each processing step of the multiple processing steps associated with the data processing job is completed; and send, via the communication interface, to the one or more first cluster server nodes associated with the first data center, the one or more first commands. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method, comprising:
at a computing platform comprising at least one processor, memory, and a communication interface; determining, by the at least one processor, to initiate a data processing job associated with identifying one or more features of a source dataset, the data processing job comprising multiple processing steps; based on determining to initiate the data processing job, generating, by the at least one processor, one or more first commands directing one or more first cluster server nodes associated with a first data center to execute the multiple processing steps associated with the data processing job to identify the one or more features of the source dataset, the one or more first commands further directing the one or more first cluster server nodes associated with the first data center to update a checkpoint table as each processing step of the multiple processing steps associated with the data processing job is completed, and the one or more first commands further directing the one or more first cluster server nodes associated with the first data center to replicate processing results data to at least one other data center different from the first data center as each processing step of the multiple processing steps associated with the data processing job is completed; and sending, by the at least one processor, via the communication interface, to the one or more first cluster server nodes associated with the first data center, the one or more first commands. - View Dependent Claims (17, 18, 19)
-
20. One or more non-transitory computer-readable media storing instructions that, when executed by a computing platform comprising at least one processor, memory, and a communication interface, cause the computing platform to:
-
determine to initiate a data processing job associated with identifying one or more features of a source dataset, the data processing job comprising multiple processing steps; based on determining to initiate the data processing job, generate one or more first commands directing one or more first cluster server nodes associated with a first data center to execute the multiple processing steps associated with the data processing job to identify the one or more features of the source dataset, the one or more first commands further directing the one or more first cluster server nodes associated with the first data center to update a checkpoint table as each processing step of the multiple processing steps associated with the data processing job is completed, and the one or more first commands further directing the one or more first cluster server nodes associated with the first data center to replicate processing results data to at least one other data center different from the first data center as each processing step of the multiple processing steps associated with the data processing job is completed; and send, via the communication interface, to the one or more first cluster server nodes associated with the first data center, the one or more first commands.
-
Specification