×

Optimizing data partitioning for data-parallel computing

  • US 9,235,396 B2
  • Filed: 12/13/2011
  • Issued: 01/12/2016
  • Est. Priority Date: 12/13/2011
  • Status: Active Grant
First Claim
Patent Images

1. A system for optimizing data partitioning for a distributed execution engine, the system comprising:

  • a memory; and

    a processing unit coupled to the memory that is configured to operate;

    a code/EPG analysis module for deriving properties of a data-parallel program code in each vertex in a corresponding execution plan graph (EPG) compiled from the data-parallel program code using at least one attribute of a user-defined function provided by a user and a predefined set of callback application program interfaces (APIs) that enables the user to specify data attributes for partitioning the data-parallel program code and define measuring computational complexity for partitioning the data-parallel program code based on input;

    a complexity module for at least deriving the computational complexity of each vertex in the EPG;

    a data analysis module that concurrently and cooperatively functions with the code/EPG analysis module for generating a plurality of compact data representations corresponding to an input data for processing by the data-parallel program code, wherein the data analysis module, in conjunction with the code/EPG analysis module, samples the input data and estimates data statistics;

    a statistics and samples module for determining the relationship between the input data and the computational and input-output (I/O) costs based at least in part on the estimated data statistics;

    a cost modeling and estimation module for estimating the runtime cost of each vertex in the EPG and the overall runtime cost represented by the EPG; and

    a cost optimization module for determining a data partitioning plan.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×