Automatic clustering for self-organizing grids

US 8,041,773 B2
Filed: 09/23/2008
Issued: 10/18/2011
Est. Priority Date: 09/24/2007
Status: Active Grant

First Claim

Patent Images

1. A method for clustering of nodes for a distributed task, comprising automatically partitioning a set of nodes into a branched hierarchy of subsets based at least on a relative proximity according to at least one node characteristic metric, each subset having a supernode selected based on an automatic ranking of nodes within the same subset, each node within the subset being adapted to communication control information with the supernode, and the supernodes of respective subnets which are hierarchically linked being adapted to communicate control information with each other;

and outputting a set of preferred nodes for allocation of portions of a distributed task, wherein the output set of preferred nodes is dependent on the hierarchy and the distributed task.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Computational grids have traditionally not scaled effectively due to administrative hurdles to resource and user participation. Most production grids are essentially multi-site supercomputer centers, rather than truly open and heterogeneous sets of resources that can join and leave dynamically, and that can provide support for an equally dynamic set of users. Large-scale grids containing individual resources with more autonomy about when and how they join and leave will require self-organizing grid middleware services that do not require centralized administrative control. Dynamic discovery of high-performance variable-size clusters of grid nodes provides an effective solution for implementation of grids. A brute force approach to the problem of identifying these “ad-hoc clusters” would require excessive overhead in terms of both message exchange and computation. Therefore, a scalable solution is provided that uses a delay-based overlay structure to organize nodes based on their proximity to one another, using a small number of delay experiments. This overlay can then be used to provide a variable-size set of promising candidate nodes than can then be used as a cluster, or tested further to improve the selection. Simulation results show that this approach results in effective clustering with acceptable overhead.

214 Citations

35 Claims

1. A method for clustering of nodes for a distributed task, comprising automatically partitioning a set of nodes into a branched hierarchy of subsets based at least on a relative proximity according to at least one node characteristic metric, each subset having a supernode selected based on an automatic ranking of nodes within the same subset, each node within the subset being adapted to communication control information with the supernode, and the supernodes of respective subnets which are hierarchically linked being adapted to communicate control information with each other;
- and outputting a set of preferred nodes for allocation of portions of a distributed task, wherein the output set of preferred nodes is dependent on the hierarchy and the distributed task.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The method according to claim 1, wherein the nodes are partitioned into the branched hierarchy based on a link delay metric.
  - 3. The method according to claim 1, wherein the at least one node characteristic metric comprises a pairwise communication latency between respective nodes.
  - 4. The method according to claim 1, wherein said automatically partitioning is initiated prior to allocating portions of the task, and wherein the hierarchy is modified based on dynamically changing conditions by proactive communications.
  - 5. The method according to claim 4, wherein said proactively communicating comprises transmitting a heartbeat signal.
  - 6. The method according to claim 5, wherein the heartbeat signal is provided as part of a communication between respective nodes provided for at least one other purpose.
  - 7. The method according to claim 1, wherein the automatically partitioning occurs dynamically while a distributed task is in progress.
  - 8. The method according to claim 1, wherein a supernode status is selected dynamically.
  - 9. The method according to claim 1, wherein the hierarchy is established based at least in part on proactive communications.
  - 10. The method according to claim 9, wherein a genetic algorithm controls the proactive communications to estimate a network state representing the set of nodes, substantially without testing each potential communication link therein.
  - 11. The method according to claim 1, further comprising placing a new node within the hierarchy while the distributed task is in progress and allocating the new node a portion of the distributed task.
  - 12. The method according to claim 1, further comprising splitting a subset containing nodes performing a portion of the distributed task into a plurality of subsets, each subset having a node selected to be a supernode, while the distributed task is in progress.
  - 13. The method according to claim 1, wherein a number of nodes within a subset is dependent on a threshold number.
  - 14. The method according to claim 1, further comprising moving a node from one subset to another subset while the node is allocated a portion of the distributed task, wherein a respective supernode for the node is changed.
  - 15. The method according to claim 1, further comprising promoting a node within a subset allocated a portion of the distributed task to a supernode if a respective previous supernode is unavailable, wherein said promoting occurs automatically without communications with the previous supernode while the distributed task is in progress.
  - 16. The method according to claim 1, wherein the set of nodes comprises at least a portion of a grid of computing resources.
  - 17. The method according to claim 16, wherein the grid of computing resources is self-organizing.

18. A cluster of nodes adapted to perform a distributed task, comprising:
- a branched hierarchy of nodes, partitioned into subsets of nodes based at least on a relative proximity according to at least one node characteristic metric, each subset each having a supernode selected based on an automatic ranking of nodes within the same subset, each node within the subset being adapted to communication control information with the supernode, and the supernodes of respective subnets which are hierarchically linked being adapted to communicate control information with each other;
  
  at least one processor adapted to determine a set of preferred nodes for allocation of portions of a distributed task, wherein the set of preferred nodes is dependent on the hierarchy and the distributed task.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
- - 19. The cluster of nodes according to claim 18, wherein said at least one processor comprises a distributed control.
  - 20. The cluster of nodes according to claim 18, wherein said at least one processor comprises a plurality of processors which are part of respective nodes, wherein the allocation of portions of the distributed task to the at least a portion of the nodes is tolerant to a loss of at least one of said processors from the set of nodes.
  - 21. The cluster of nodes according to claim 18, wherein the nodes are partitioned into the branched hierarchy based on a link delay metric.
  - 22. The cluster of nodes according to claim 18, wherein the at least one node characteristic metric comprises a pairwise communication latency between respective nodes.
  - 23. The cluster of nodes according to claim 18, wherein the nodes are initially partitioned prior to allocating portions of the task, and wherein the hierarchy is thereafter modified based on dynamically changing conditions by proactive communications.
  - 24. The cluster of nodes according to claim 23, wherein each node proactively communicates a heartbeat signal to at least one other node.
  - 25. The cluster of nodes according to claim 24, wherein the heartbeat signal is provided as part of a communication between respective nodes provided for at least one other purpose.
  - 26. The cluster of nodes according to claim 18, wherein the set of nodes is partitioned into the hierarchy of subsets dynamically while a distributed task is in progress based on communicated control information.
  - 27. The cluster of nodes according to claim 18, wherein at least one node has a processor which executes a genetic algorithm which controls proactive communications between nodes to estimate a network state representing the set of nodes, substantially without testing each potential communication link therein.
  - 28. The cluster of nodes according to claim 18, wherein a processor is provided adapted to place a new node within the hierarchy while the distributed task is in progress.
  - 29. The cluster of nodes according to claim 18, wherein a processor is provided adapted to split a subset containing nodes performing a portion of the distributed task into a plurality of subsets.
  - 30. The cluster of nodes according to claim 18, wherein a processor is provided adapted to move a node from one subset to another subset while the node is allocated a portion of the distributed task, wherein a respective supernode for the node is changed.
  - 31. The cluster of nodes according to claim 18, wherein a processor is provided adapted to promote a node within a subset allocated a portion of the distributed task to a supernode if a respective previous supernode is unavailable.
  - 32. The cluster of nodes according to claim 18, wherein the set of nodes comprises at least a portion of a grid of computing resources.
  - 33. The cluster of nodes according to claim 32, wherein the grid of computing resources is self-organizing based on logic executed by a respective processor associated with each node.

34. A non-transitory computer readable medium, storing instructions for controlling a programmable processor to output a set of preferred nodes for allocation of portions of a distributed task, wherein the output set of preferred nodes is dependent on a branched hierarchy of nodes and the distributed task, wherein the branched hierarchy of nodes is formed by automatically partitioning a set of nodes into a branched hierarchy of subsets based at least on a relative proximity according to at least one node characteristic metric, each subset having a supernode selected based on an automatic ranking of nodes within the same subset, each node within the subset being adapted to communication control information with the supernode, and the supernodes of respective subnets which are hierarchically linked being adapted to communicate control information with each other.

35. A method of controlling a distributed processing of a task, comprising:
- automatically partitioning a set of nodes into a branched hierarchy of subsets based at least on a relative proximity according to at least one inter-node communication limiting metric, each subset within a branch of the branched hierarchy having a supernode selected based on an automatic ranking of nodes within the same subset with respect to the at least one inter-node communication limiting metric;
  
  communicating control information from each node within a subset with a supernode of the respective subset;
  
  communicating between the supernode of the respective subnet and supernodes of other subnets which are linked through the branched hierarchy to the supernode of the respective subnet; and
  
  allocating of portions of a distributed task to nodes within the branched hierarchy selectively dependent on at least;
  
  an arrangement of the hierarchy; and
  
  at least on characteristic of the distributed task.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
The Research Foundation for The State University of New York (State University of New York)
Original Assignee
The Research Foundation for The State University of New York (State University of New York)
Inventors
Lewis, Michael, Abu-Ghazaleh, Nael, Yang, Weishuai
Primary Examiner(s)
Chang; Jungwon

Application Number

US12/236,396
Publication Number

US 20090083390A1
Time in Patent Office

1,120 Days
Field of Search

709/226, 709/209, 709/224, 370/335, 710/242
US Class Current

709/209
CPC Class Codes

G06F 15/16   Combinations of two or more...

G06Q 10/06   Resources, workflows, human...

H04L 41/12   Discovery or management of ...

H04L 43/10   Active monitoring, e.g. hea...

H04L 45/12   Shortest path evaluation

H04L 45/121   by minimising delays

H04L 45/122   by minimising distances, e....

H04L 47/70   Admission control; Resource...

H04L 47/783   Distributed allocation of r...

H04L 67/02   based on web technology, e....

H04L 67/10   in which an application is ...

H04L 67/1044   Group management mechanisms...

H04L 67/51   Discovery or management the...

Automatic clustering for self-organizing grids

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

214 Citations

35 Claims

Specification

Use Cases

Quick Links

Others

Automatic clustering for self-organizing grids

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

214 Citations

35 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others