File Block Placement in a Distributed Network
First Claim
1. A method for use in distributing a file block in a distributed file system network that includes a plurality of data storage nodes, the method comprising:
- identifying a first set of links, each link in the first set of links being from a node having the file block to another node in the distributed file system network;
calculating a first set of link costs, each link cost in the first set of link costs being indicative of congestion on the associated link;
calculating a first set of candidate pipeline costs for a first set of candidate pipelines, each candidate pipeline in the first set of candidate pipelines including a link in the first set of links and having an endpoint at the corresponding other node in the distributed file system network, each candidate pipeline cost in the first set of candidate pipeline costs being based on the corresponding link cost in the first set of link costs;
selecting a pipeline from the first set of candidate pipelines based on the first set of candidate pipeline costs;
storing, in a candidate pipeline store, information about the candidate pipelines in the set of candidate pipelines other than the selected pipeline; and
iterativelyidentifying a set of immediate links;
each link in the set of immediate links being from the endpoint of the selected pipeline to another node in the distributed file system network,calculating a set of link costs, each link cost in the set of link costs being indicative of congestion on the associated link,calculating a set of candidate pipeline costs for a set of candidate pipelines, each candidate pipeline in the set of candidate pipelines including the selected pipeline and a link in the set of immediate links and having an endpoint at the corresponding other node in the distributed file system network, each candidate pipeline cost in the set of candidate pipeline costs being based on the candidate pipeline cost of the selected pipeline and the corresponding link cost in the set of link costs,selecting a candidate pipeline from the set of candidate pipelines based on the calculated set of candidate pipeline costs,storing information about the unselected candidate pipelines in the set of candidate pipelines in the candidate pipeline store, andselecting a new selected pipeline for use in a subsequent iteration based at least in part on the candidate pipeline costs associated the selected candidate pipeline,until the endpoint of the selected pipeline is one of the plurality of data storage nodes.
2 Assignments
0 Petitions
Accused Products
Abstract
Pipelines for distributing file block in distributed file system network can be determined using a crawler algorithm. The crawler algorithm can iteratively identify links in a pipeline from for a starting node to one or more data storage nodes. In each iteration the pipeline can be extended based on the costs associated with the links on the pipeline with the resulting cost propagated as the pipeline is extended. The link costs indicate congestion on the links. Costs may also be back propagate from the data storage nodes.
66 Citations
20 Claims
-
1. A method for use in distributing a file block in a distributed file system network that includes a plurality of data storage nodes, the method comprising:
-
identifying a first set of links, each link in the first set of links being from a node having the file block to another node in the distributed file system network; calculating a first set of link costs, each link cost in the first set of link costs being indicative of congestion on the associated link; calculating a first set of candidate pipeline costs for a first set of candidate pipelines, each candidate pipeline in the first set of candidate pipelines including a link in the first set of links and having an endpoint at the corresponding other node in the distributed file system network, each candidate pipeline cost in the first set of candidate pipeline costs being based on the corresponding link cost in the first set of link costs; selecting a pipeline from the first set of candidate pipelines based on the first set of candidate pipeline costs; storing, in a candidate pipeline store, information about the candidate pipelines in the set of candidate pipelines other than the selected pipeline; and iteratively identifying a set of immediate links;
each link in the set of immediate links being from the endpoint of the selected pipeline to another node in the distributed file system network,calculating a set of link costs, each link cost in the set of link costs being indicative of congestion on the associated link, calculating a set of candidate pipeline costs for a set of candidate pipelines, each candidate pipeline in the set of candidate pipelines including the selected pipeline and a link in the set of immediate links and having an endpoint at the corresponding other node in the distributed file system network, each candidate pipeline cost in the set of candidate pipeline costs being based on the candidate pipeline cost of the selected pipeline and the corresponding link cost in the set of link costs, selecting a candidate pipeline from the set of candidate pipelines based on the calculated set of candidate pipeline costs, storing information about the unselected candidate pipelines in the set of candidate pipelines in the candidate pipeline store, and selecting a new selected pipeline for use in a subsequent iteration based at least in part on the candidate pipeline costs associated the selected candidate pipeline, until the endpoint of the selected pipeline is one of the plurality of data storage nodes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computing device for distributing a file block in a distributed file system network that includes a plurality of data storage nodes, the computing device comprising:
-
a memory configured to store data and processing instructions; and a processor configured to retrieve and execute the processing instructions stored in the memory to cause the processor to perform the steps of; identifying a first set of links, each link in the first set of links being from a node having the file block to another node in the distributed file system network; calculating a first set of link costs, each link cost in the first set of link costs being indicative of congestion on the associated link; calculating a first set of candidate pipeline costs for a first set of candidate pipelines, each candidate pipeline in the first set of candidate pipelines including a link in the first set of links and having an endpoint at the corresponding other node in the distributed file system network, each candidate pipeline cost in the first set of candidate pipeline costs being based on the corresponding link cost in the first set of link costs; selecting a pipeline from the first set of candidate pipelines based on the first set of candidate pipeline costs; storing, in a candidate pipeline store, information about the candidate pipelines in the set of candidate pipelines other than the selected pipeline; and iteratively identifying a set of immediate links;
each link in the set of immediate links being from the endpoint of the selected pipeline to another node in the distributed file system network,calculating a set of link costs, each link cost in the set of link costs being indicative of congestion on the associated link, calculating a set of candidate pipeline costs for a set of candidate pipelines, each candidate pipeline in the set of candidate pipelines including the selected pipeline and a link in the set of immediate links and having an endpoint at the corresponding other node in the distributed file system network, each candidate pipeline cost in the set of candidate pipeline costs being based on the candidate pipeline cost of the selected pipeline and the corresponding link cost in the set of link costs, selecting a candidate pipeline from the set of candidate pipelines based on the calculated set of candidate pipeline costs, storing information about the unselected candidate pipelines in the set of candidate pipelines in the candidate pipeline store, and selecting a new selected pipeline for use in a subsequent iteration based at least in part on the candidate pipeline costs associated the selected candidate pipeline, until the endpoint of the selected pipeline is one of the plurality of data storage nodes. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A non-transitory computer readable medium storing instructions that, when executed by a processor, perform a method for use in distributing a file block in a distributed file system network that includes a plurality of data storage nodes, the method comprising:
-
identifying a first set of links, each link in the first set of links being from a node having the file block to another node in the distributed file system network; calculating a first set of link costs, each link cost in the first set of link costs being indicative of congestion on the associated link; calculating a first set of candidate pipeline costs for a first set of candidate pipelines, each candidate pipeline in the first set of candidate pipelines including a link in the first set of links and having an endpoint at the corresponding other node in the distributed file system network, each candidate pipeline cost in the first set of candidate pipeline costs being based on the corresponding link cost in the first set of link costs; selecting a pipeline from the first set of candidate pipelines based on the first set of candidate pipeline costs; storing, in a candidate pipeline store, information about the candidate pipelines in the set of candidate pipelines other than the selected pipeline; and iteratively identifying a set of immediate links;
each link in the set of immediate links being from the endpoint of the selected pipeline to another node in the distributed file system network,calculating a set of link costs, each link cost in the set of link costs being indicative of congestion on the associated link, calculating a set of candidate pipeline costs for a set of candidate pipelines, each candidate pipeline in the set of candidate pipelines including the selected pipeline and a link in the set of immediate links and having an endpoint at the corresponding other node in the distributed file system network, each candidate pipeline cost in the set of candidate pipeline costs being based on the candidate pipeline cost of the selected pipeline and the corresponding link cost in the set of link costs, selecting a candidate pipeline from the set of candidate pipelines based on the calculated set of candidate pipeline costs, storing information about the unselected candidate pipelines in the set of candidate pipelines in the candidate pipeline store, and selecting a new selected pipeline for use in a subsequent iteration based at least in part on the candidate pipeline costs associated the selected candidate pipeline, until the endpoint of the selected pipeline is one of the plurality of data storage nodes.
-
Specification