×

Concurrent data processing in a distributed system

  • US 8,266,289 B2
  • Filed: 04/23/2009
  • Issued: 09/11/2012
  • Est. Priority Date: 04/23/2009
  • Status: Active Grant
First Claim
Patent Images

1. One or more computer storage device having computer-useable instructions embodied thereon for performing a method for scheduling vertices in a cluster, the method comprising:

  • receiving a data job;

    dividing the data job into a plurality of vertices;

    assigning the plurality of vertices to one or more process nodes that comprise the cluster;

    receiving resource usage information for one or more vertices, wherein the vertices have run to completion, and wherein resource usage information for the vertices has been determined; and

    for each of the plurality of vertices for which resource usage information has not been received;

    estimating resource usage of the vertex from the received resource usage information for the completed vertices, wherein estimating resource usage comprises;

    (A) estimating an input data size range;

    (B) dividing the input data size range into data size buckets,wherein the data size buckets are subsets of the data size range;

    (C) storing resource usage information for each completed vertex in the corresponding data size bucket; and

    (D) for each data size bucket, calculating estimated resource usage information for uncompleted vertices with an input data size within the data size bucket'"'"'s range; and

    transmitting the estimated resource usage of the vertex to the process node in the cluster to which the vertex is assigned, wherein the process node allocates computing resources to the vertex based on the estimated resource usage.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×