Processing of data using a database system in communication with a data processing framework
First Claim
1. A data processing system, comprising:
- a data processing framework having at least one processor and configured to receive a data processing task for processing;
a plurality of database systems having at least another processor, the plurality of database systems being distinct from and are coupled to the data processing framework, wherein the database systems are configured to perform a data processing task, the at least one processor of the data processing framework is distinct from the at least another processor of the plurality of database systems;
wherein the data processing task is configured to be partitioned into a plurality of partitions;
a distributed file system in communication with the data processing framework and the plurality of database systems, and being distinct from each of the data processing framework and the plurality of database systems, the distributed file system optionally stores at least one output data associated with at least one partition of the data processing task being processed by the data processing framework;
each database system in the plurality of database systems is configured to process a partition of the data processing task assigned for processing to that database system by the data processing framework based on at least a processing capacity of that database system, wherein the processing capacity is determined based on at least one of the following;
whether that database system is still processing a previously assigned partition of the data processing task, and having each database system determine and provide an indication of its processing capacity to the data processing framework while the data processing task is being processed;
each database system in the plurality of database systems is configured to perform processing of its assigned partition of the data processing task in parallel with another database system in the plurality of database systems processing another partition of the data processing task assigned to the another database system;
wherein the data processing framework is configured to process the at least one-partition of the data processing task;
a storage component in communication with the data processing framework and the plurality database systems, configured to store information about each partition of the data processing task being processed by each database system in the plurality of database systems and the data processing framework, wherein the storage component stores at least one connection parameter specific to each database system and information about at least one data partition property of data stored in at least one database system, wherein, using the at least one connection parameter and the at least one data partition property, processing of each partition of the data processing task is optimized in accordance with at least one requirement of each database system; and
a database connector component configured to provide a communication interface between the plurality of database systems and the data processing framework.
1 Assignment
0 Petitions
Accused Products
Abstract
A system, method, and computer program product for processing data are disclosed. The system includes a data processing framework configured to receive a data processing task for processing, a plurality of database systems coupled to the data processing framework, and a storage component in communication with the data processing framework and the plurality database systems. The database systems perform a data processing task. The data processing task is partitioned into a plurality of partitions and each database system processes a partition of the data processing task assigned for processing to that database system. Each database system performs processing of its assigned partition of the data processing task in parallel with another database system processing another partition of the data processing task assigned to the another database system. The data processing framework performs at least one partition of the data processing task.
-
Citations
40 Claims
-
1. A data processing system, comprising:
-
a data processing framework having at least one processor and configured to receive a data processing task for processing; a plurality of database systems having at least another processor, the plurality of database systems being distinct from and are coupled to the data processing framework, wherein the database systems are configured to perform a data processing task, the at least one processor of the data processing framework is distinct from the at least another processor of the plurality of database systems; wherein the data processing task is configured to be partitioned into a plurality of partitions; a distributed file system in communication with the data processing framework and the plurality of database systems, and being distinct from each of the data processing framework and the plurality of database systems, the distributed file system optionally stores at least one output data associated with at least one partition of the data processing task being processed by the data processing framework; each database system in the plurality of database systems is configured to process a partition of the data processing task assigned for processing to that database system by the data processing framework based on at least a processing capacity of that database system, wherein the processing capacity is determined based on at least one of the following;
whether that database system is still processing a previously assigned partition of the data processing task, and having each database system determine and provide an indication of its processing capacity to the data processing framework while the data processing task is being processed;each database system in the plurality of database systems is configured to perform processing of its assigned partition of the data processing task in parallel with another database system in the plurality of database systems processing another partition of the data processing task assigned to the another database system; wherein the data processing framework is configured to process the at least one-partition of the data processing task; a storage component in communication with the data processing framework and the plurality database systems, configured to store information about each partition of the data processing task being processed by each database system in the plurality of database systems and the data processing framework, wherein the storage component stores at least one connection parameter specific to each database system and information about at least one data partition property of data stored in at least one database system, wherein, using the at least one connection parameter and the at least one data partition property, processing of each partition of the data processing task is optimized in accordance with at least one requirement of each database system; and a database connector component configured to provide a communication interface between the plurality of database systems and the data processing framework. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 38)
-
-
19. A computer implemented method for processing data using a data processing system having a data processing framework having at least one processor, a plurality of database systems having at least another processor, the plurality of database systems being distinct from and are coupled to the data processing framework, the at least one processor of the data processing framework is distinct from the at least another processor of the plurality of database systems, a distributed file system in communication with the data processing framework and the plurality of database systems, and being distinct from each of the data processing framework and the plurality of database systems, the distributed file system optionally stores at least one output data associated with at least one partition of the data processing task being processed by the data processing framework, and a storage component in communication with the data processing framework and the plurality database systems, the method comprising:
-
receiving a data processing task for processing using the data processing framework; partitioning the data processing task into a plurality of partitions; determining and assigning, using the data processing framework, a partition of the data processing task to at least one database system in the plurality of database systems for processing, wherein the data processing framework assigns partitions of data processing task to database systems based on at least a processing capacity of a database system in the plurality of database systems, wherein the processing capacity is determined based on at least one of the following;
whether that database system is still processing a previously assigned partition of the data processing task, and having each database system determine and provide an indication of its processing capacity to the data processing framework while the data processing task is being processed;processing, using the at least one database systems, the assigned partitions in parallel with processing of at least another partition of the data processing task by another database system in the plurality of database systems; processing at least one partition of the data processing task using the data processing framework; and using the storage component, storing information about each partition of the data processing task being processed by each database system in the plurality of database systems and the data processing framework, wherein the storage component stores at least one connection parameter specific to each database system and information about at least one data partition property of data stored in at least one database system, wherein, using the at least one connection parameter and the at least one data partition property, processing of each partition of the data processing task is optimized in accordance with at least one requirement of each database system; wherein the data processing system further includes a database connector component configured to provide a communication interface between the plurality of database systems and the data processing framework. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 39)
-
-
37. A non-transitory computer program product, tangibly embodied in a computer-readable medium, the computer program product being operable to cause a data processing system having a data processing framework, a plurality of database systems being distinct from and are coupled to the data processing framework, the at least one processor of the data processing framework is distinct from the at least another processor of the plurality of database systems, a distributed file system in communication with the data processing framework and the plurality of database systems, and being distinct from each of the data processing framework and the plurality of database systems, the distributed file system optionally stores at least one output data associated with at least one partition of the data processing task being processed by the data processing framework, and a storage component in communication with the data processing framework and the plurality database systems, to perform operations comprising:
-
receiving a data processing task for processing using the data processing framework; partitioning the data processing task into a plurality of partitions; determining and assigning, using the data processing framework, a partition of the data processing task to at least one database system in the plurality of database systems for processing, wherein the data processing framework assigns partitions of data processing task to database systems based on at least a processing capacity of a database system in the plurality of database systems, wherein the processing capacity is determined based on at least one of the following;
whether that database system is still processing a previously assigned partition of the data processing task, and having each database system determine and provide an indication of its processing capacity to the data processing framework while the data processing task is being processed;processing, using the at least one database systems, the assigned partitions in parallel with processing of at least another partition of the data processing task by another database system in the plurality of database systems; processing at least one partition of the data processing task using the data processing framework; and using the storage component, storing information about each partition of the data processing task being processed by each database system in the plurality of database systems and the data processing framework, wherein the storage component stores at least one connection parameter specific to each database system and information about at least one data partition property of data stored in at least one database system, wherein, using the at least one connection parameter and the at least one data partition property, processing of each partition of the data processing task is optimized in accordance with at least one requirement of each database system; wherein the data processing system further includes a database connector component configured to provide a communication interface between the plurality of database systems and the data processing framework. - View Dependent Claims (40)
-
Specification