×

Processing of data using a database system in communication with a data processing framework

  • US 9,495,427 B2
  • Filed: 02/22/2011
  • Issued: 11/15/2016
  • Est. Priority Date: 06/04/2010
  • Status: Active Grant
First Claim
Patent Images

1. A data processing system, comprising:

  • a data processing framework having at least one processor and configured to receive a data processing task for processing;

    a plurality of database systems having at least another processor, the plurality of database systems being distinct from and are coupled to the data processing framework, wherein the database systems are configured to perform a data processing task, the at least one processor of the data processing framework is distinct from the at least another processor of the plurality of database systems;

    wherein the data processing task is configured to be partitioned into a plurality of partitions;

    a distributed file system in communication with the data processing framework and the plurality of database systems, and being distinct from each of the data processing framework and the plurality of database systems, the distributed file system optionally stores at least one output data associated with at least one partition of the data processing task being processed by the data processing framework;

    each database system in the plurality of database systems is configured to process a partition of the data processing task assigned for processing to that database system by the data processing framework based on at least a processing capacity of that database system, wherein the processing capacity is determined based on at least one of the following;

    whether that database system is still processing a previously assigned partition of the data processing task, and having each database system determine and provide an indication of its processing capacity to the data processing framework while the data processing task is being processed;

    each database system in the plurality of database systems is configured to perform processing of its assigned partition of the data processing task in parallel with another database system in the plurality of database systems processing another partition of the data processing task assigned to the another database system;

    wherein the data processing framework is configured to process the at least one-partition of the data processing task;

    a storage component in communication with the data processing framework and the plurality database systems, configured to store information about each partition of the data processing task being processed by each database system in the plurality of database systems and the data processing framework, wherein the storage component stores at least one connection parameter specific to each database system and information about at least one data partition property of data stored in at least one database system, wherein, using the at least one connection parameter and the at least one data partition property, processing of each partition of the data processing task is optimized in accordance with at least one requirement of each database system; and

    a database connector component configured to provide a communication interface between the plurality of database systems and the data processing framework.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×