×

In-Memory Dataflow Execution with Dynamic Placement of Cache Operations and Action Execution Ordering

  • US 20200133859A1
  • Filed: 10/30/2018
  • Published: 04/30/2020
  • Est. Priority Date: 10/30/2018
  • Status: Active Grant
First Claim
Patent Images

1. A method, comprising the steps of:

  • obtaining a cost model for the execution of operations of a dataflow in a parallel processing framework with a given infrastructure and input dataset;

    obtaining a current cache placement plan for the dataflow, wherein the current cache placement plan comprises a combination of output datasets of a subset of the operations in the dataflow to cache, using one or more cache operations in the dataflow, based on an estimated reduction in a total execution cost for the dataflow in conjunction with the current cache placement plan being implemented given an input dataset;

    obtaining a current cache gain estimate for the current cache placement plan;

    selecting an action of the dataflow to execute from a plurality of remaining actions in the dataflow based on a predefined next action policy that selects the next action of the dataflow to execute from the plurality of remaining actions in the dataflow;

    executing one or more operations in a lineage of the selected action of the dataflow;

    determining, using at least one processing device, an alternative cache placement plan for the dataflow following the execution in conjunction with a predefined new plan determination criteria being satisfied, wherein the alternative cache placement plan comprises an alternative combination of output datasets of a second subset of the operations in the dataflow to cache, using one or more alternative cache operations in the dataflow, relative to the current cache placement plan;

    obtaining an alternative cache gain estimate for the alternative cache placement plan;

    implementing, using the at least one processing device, the alternative cache placement plan in conjunction with the predefined new plan implementation criteria being satisfied; and

    selecting a next action of the dataflow to execute from a plurality of remaining actions in the dataflow based on the predefined next action policy.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×