Data processing and data movement in cloud computing environment
First Claim
1. A method for moving data from a source site to a target site in a cloud computing platform, comprising:
- receiving a plurality of data sets to be moved from the source site to the target site at a plurality of containerized data ingest components located at the source site;
providing the received plurality of data sets from the plurality of data ingest components to a staging cluster comprising a plurality of containerized broker components located at the source site, wherein the plurality of containerized broker components queue the plurality of data sets, wherein the staging cluster replicates one or more partitions of the received data set between broker components, and wherein each broker component performs a data deduplication operation;
providing the queued plurality of data sets from the plurality of containerized broker components to a processing cluster comprising a plurality of containerized data processing components, wherein the plurality of containerized data processing components process the plurality of data sets, wherein the processing stage performs one or more of data encryption, data reduction, and data indexing prior to a data set being transmitted to the target site;
transmitting the plurality of data sets from the plurality of containerized data processing components to the target site;
wherein, for each data ingest component of the plurality of data ingest components, a respective pipeline is formed through the staging cluster and the processing cluster, and wherein the staging cluster and the processing cluster are scalable such that the method further comprises;
adding an additional pipeline comprising a given containerized broker component in the staging cluster and a given containerized data processing component in the processing cluster when a data ingest component is added; and
removing an existing pipeline comprising a given containerized broker component in the staging cluster and a given containerized data processing component in the processing cluster when an existing data ingest component is removed;
wherein the staging cluster and the processing cluster perform a two-phase acknowledgment procedure comprising an acknowledge step and a commit step to confirm that a data set has been fully processed by the processing cluster; and
wherein the staging cluster removes the data set once receiving confirmation that the data set has been fully processed;
wherein the source site and the target site are implemented via one or more processing devices operatively coupled via a communication network.
7 Assignments
0 Petitions
Accused Products
Abstract
A plurality of data sets to be moved from a source site to a target site in a cloud computing platform is received at a plurality of a containerized data ingest components located at the source site. The received plurality of data sets are provided from the plurality of data ingest components to a staging cluster comprising a plurality of containerized broker components located at the source site, wherein the plurality of containerized broker components queue the plurality of data sets. The queued plurality of data sets are provided from the plurality of containerized broker components to a processing cluster comprising a plurality of containerized data processing components, wherein the plurality of containerized data processing components process the plurality of data sets. The plurality of data sets is transmitted from the plurality of containerized data processing components to the target site.
-
Citations
20 Claims
-
1. A method for moving data from a source site to a target site in a cloud computing platform, comprising:
-
receiving a plurality of data sets to be moved from the source site to the target site at a plurality of containerized data ingest components located at the source site; providing the received plurality of data sets from the plurality of data ingest components to a staging cluster comprising a plurality of containerized broker components located at the source site, wherein the plurality of containerized broker components queue the plurality of data sets, wherein the staging cluster replicates one or more partitions of the received data set between broker components, and wherein each broker component performs a data deduplication operation; providing the queued plurality of data sets from the plurality of containerized broker components to a processing cluster comprising a plurality of containerized data processing components, wherein the plurality of containerized data processing components process the plurality of data sets, wherein the processing stage performs one or more of data encryption, data reduction, and data indexing prior to a data set being transmitted to the target site; transmitting the plurality of data sets from the plurality of containerized data processing components to the target site; wherein, for each data ingest component of the plurality of data ingest components, a respective pipeline is formed through the staging cluster and the processing cluster, and wherein the staging cluster and the processing cluster are scalable such that the method further comprises; adding an additional pipeline comprising a given containerized broker component in the staging cluster and a given containerized data processing component in the processing cluster when a data ingest component is added; and removing an existing pipeline comprising a given containerized broker component in the staging cluster and a given containerized data processing component in the processing cluster when an existing data ingest component is removed; wherein the staging cluster and the processing cluster perform a two-phase acknowledgment procedure comprising an acknowledge step and a commit step to confirm that a data set has been fully processed by the processing cluster; and
wherein the staging cluster removes the data set once receiving confirmation that the data set has been fully processed;wherein the source site and the target site are implemented via one or more processing devices operatively coupled via a communication network. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system for moving data from a source site to a target site in a cloud computing platform, the system comprising:
-
at least one processor, coupled to a memory, and configured to; receive a plurality of data sets to be moved from the source site to the target site at a plurality of containerized data ingest components located at the source site; provide the received plurality of data sets from the plurality of data ingest components to a staging cluster comprising a plurality of containerized broker components located at the source site, wherein the plurality of containerized broker components queue the plurality of data sets, wherein the staging cluster replicates one or more partitions of the received data set between broker components, and wherein each broker component performs a data deduplication operation; provide the queued plurality of data sets from the plurality of containerized broker components to a processing cluster comprising a plurality of containerized data processing components, wherein the plurality of containerized data processing components process the plurality of data sets, wherein the processing stage performs one or more of data encryption, data reduction, and data indexing prior to a data set is transmitted to the target site; transmit the plurality of data sets from the plurality of containerized data processing components to the target site; wherein, for each data ingest component of the plurality of data ingest components, a respective pipeline is formed through the staging cluster and the processing cluster, and wherein the staging cluster and the processing cluster are scalable such that the processor is further configured to; add an additional pipeline comprising a given containerized broker component in the staging cluster and a given containerized data processing component in the processing cluster when a data ingest component is added; and remove an existing pipeline comprising a given containerized broker component in the staging cluster and a given containerized data processing component in the processing cluster when an existing data ingest component is removed; wherein the staging cluster and the processing cluster perform a two-phase acknowledgment procedure comprising an acknowledge step and a commit step to confirm that a data set has been fully processed by the processing cluster; and
wherein the staging cluster removes the data set once receiving confirmation that the data set has been fully processed;wherein the source site and the target site are operatively coupled via a communication network. - View Dependent Claims (14, 15, 18, 19)
-
-
16. An article of manufacture for moving data from a source site to a target site in a cloud computing platform, the article of manufacture comprising a processor-readable storage medium having encoded therein executable code of one or more software programs, wherein the one or more software programs when executed by at least one processing device implement the steps of:
-
receiving a plurality of data sets to be moved from the source site to the target site at a plurality of containerized data ingest components located at the source site; providing the received plurality of data sets from the plurality of data ingest components to a staging cluster comprising a plurality of containerized broker components located at the source site, wherein the plurality of containerized broker components queue the plurality of data sets, wherein the staging cluster replicates one or more partitions of the received data set between broker components, and wherein each broker component performs a data deduplication operation; providing the queued plurality of data sets from the plurality of containerized broker components to a processing cluster comprising a plurality of containerized data processing components, wherein the plurality of containerized data processing components process the plurality of data sets, wherein the processing stage performs one or more of data encryption, data reduction, and data indexing prior to a data set is transmitted to the target site; transmitting the plurality of data sets from the plurality of containerized data processing components to the target site; wherein, for each data ingest component of the plurality of data ingest components, a respective pipeline is formed through the staging cluster and the processing cluster, and wherein the staging cluster and the processing cluster are scalable such that the implemented steps further comprise; adding an additional pipeline comprising a given containerized broker component in the staging cluster and a given containerized data processing component in the processing cluster when a data ingest component is added; and removing an existing pipeline comprising a given containerized broker component in the staging cluster and a given containerized data processing component in the processing cluster when an existing data ingest component is removed; wherein the staging cluster and the processing cluster perform a two-phase acknowledgment procedure comprising an acknowledge step and a commit step to confirm that a data set has been fully processed by the processing cluster; and
wherein the staging cluster removes the data set once receiving confirmation that the data set has been fully processed;wherein the source site and the target site operatively coupled via a communication network. - View Dependent Claims (17, 20)
-
Specification