File Block Placement in a Distributed Network

US 20170026263A1
Filed: 03/03/2016
Published: 01/26/2017
Est. Priority Date: 09/26/2013
Status: Active Grant

First Claim

Patent Images

1. A method for use in distributing a file block in a distributed file system network that includes a plurality of data storage nodes, the method comprising:

identifying a first set of links, each link in the first set of links being from a node having the file block to another node in the distributed file system network;

calculating a first set of link costs, each link cost in the first set of link costs being indicative of congestion on the associated link;

calculating a first set of candidate pipeline costs for a first set of candidate pipelines, each candidate pipeline in the first set of candidate pipelines including a link in the first set of links and having an endpoint at the corresponding other node in the distributed file system network, each candidate pipeline cost in the first set of candidate pipeline costs being based on the corresponding link cost in the first set of link costs;

selecting a pipeline from the first set of candidate pipelines based on the first set of candidate pipeline costs;

storing, in a candidate pipeline store, information about the candidate pipelines in the set of candidate pipelines other than the selected pipeline; and

iterativelyidentifying a set of immediate links;

each link in the set of immediate links being from the endpoint of the selected pipeline to another node in the distributed file system network,calculating a set of link costs, each link cost in the set of link costs being indicative of congestion on the associated link,calculating a set of candidate pipeline costs for a set of candidate pipelines, each candidate pipeline in the set of candidate pipelines including the selected pipeline and a link in the set of immediate links and having an endpoint at the corresponding other node in the distributed file system network, each candidate pipeline cost in the set of candidate pipeline costs being based on the candidate pipeline cost of the selected pipeline and the corresponding link cost in the set of link costs,selecting a candidate pipeline from the set of candidate pipelines based on the calculated set of candidate pipeline costs,storing information about the unselected candidate pipelines in the set of candidate pipelines in the candidate pipeline store, andselecting a new selected pipeline for use in a subsequent iteration based at least in part on the candidate pipeline costs associated the selected candidate pipeline,until the endpoint of the selected pipeline is one of the plurality of data storage nodes.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Pipelines for distributing file block in distributed file system network can be determined using a crawler algorithm. The crawler algorithm can iteratively identify links in a pipeline from for a starting node to one or more data storage nodes. In each iteration the pipeline can be extended based on the costs associated with the links on the pipeline with the resulting cost propagated as the pipeline is extended. The link costs indicate congestion on the links. Costs may also be back propagate from the data storage nodes.

66 Citations

20 Claims

1. A method for use in distributing a file block in a distributed file system network that includes a plurality of data storage nodes, the method comprising:
- identifying a first set of links, each link in the first set of links being from a node having the file block to another node in the distributed file system network;
  
  calculating a first set of link costs, each link cost in the first set of link costs being indicative of congestion on the associated link;
  
  calculating a first set of candidate pipeline costs for a first set of candidate pipelines, each candidate pipeline in the first set of candidate pipelines including a link in the first set of links and having an endpoint at the corresponding other node in the distributed file system network, each candidate pipeline cost in the first set of candidate pipeline costs being based on the corresponding link cost in the first set of link costs;
  
  selecting a pipeline from the first set of candidate pipelines based on the first set of candidate pipeline costs;
  
  storing, in a candidate pipeline store, information about the candidate pipelines in the set of candidate pipelines other than the selected pipeline; and
  
  iterativelyidentifying a set of immediate links;
  
  each link in the set of immediate links being from the endpoint of the selected pipeline to another node in the distributed file system network,calculating a set of link costs, each link cost in the set of link costs being indicative of congestion on the associated link,calculating a set of candidate pipeline costs for a set of candidate pipelines, each candidate pipeline in the set of candidate pipelines including the selected pipeline and a link in the set of immediate links and having an endpoint at the corresponding other node in the distributed file system network, each candidate pipeline cost in the set of candidate pipeline costs being based on the candidate pipeline cost of the selected pipeline and the corresponding link cost in the set of link costs,selecting a candidate pipeline from the set of candidate pipelines based on the calculated set of candidate pipeline costs,storing information about the unselected candidate pipelines in the set of candidate pipelines in the candidate pipeline store, andselecting a new selected pipeline for use in a subsequent iteration based at least in part on the candidate pipeline costs associated the selected candidate pipeline,until the endpoint of the selected pipeline is one of the plurality of data storage nodes.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein calculating the set of candidate pipeline costs includes, for each candidate pipeline in the set of candidate pipelines, calculating the maximum of the candidate pipeline cost of the selected pipeline and the corresponding link cost.
  - 3. The method of claim 1, wherein selecting the new selected pipeline for use in a subsequent iteration comprises, in the case that the cost associated with the selected candidate pipeline equals the cost associated with the current selected pipeline, selecting the selected candidate pipeline.
  - 4. The method of claim 1, wherein selecting the new selected pipeline for use in a subsequent iteration further comprises, in the case that the cost associated with the selected candidate pipeline is greater than the cost associated with the current selected pipeline, searching the candidate pipelines store for candidate pipelines with associated costs less than the cost associated with the selected candidate pipeline.
  - 5. The method of claim 4, wherein selecting the new selected pipeline for use in a subsequent iteration further comprises, in the case that the cost associated with the selected candidate pipeline is greater than the cost associated with the current selected pipeline and there are one or more pipelines in the candidate pipelines store with associated costs less than the cost associated with the selected candidate pipeline, selecting one of the one or more pipelines in the candidate pipelines store with associated costs less than the cost associated with the selected candidate pipeline.
  - 6. The method of claim 1, wherein identifying the set of immediate links comprises selecting links based on topology of the distributed file system network.
  - 7. The method of claim 1, wherein the cost of a link is based on the number of active flows on that link.
  - 8. The method of claim 1, wherein selecting the candidate pipeline comprises determining the minimum cost in the set of candidate pipeline costs.
  - 9. The method of claim 1, further comprising back propagating pipeline costs from at least some of the plurality of data storage nodes to other nodes in the distributed file system network, and using the back propagated pipeline costs in translating the set of link costs.
  - 10. The method of claim 9, wherein back propagating pipeline costs comprisesfor a link ending one of the plurality of data storage nodes, determining a back-propagated link cost using the cost associated with the corresponding link;
    - for a network node, determining a back-propagated node cost using a minimum of the costs associated with link connect to the network node; and
      
      for a link not ending one of the plurality of data storage nodes, determining a back-propagated link cost using a maximum of the cost associated with the link and the back-propagated node cost associated with the corresponding network node.

11. A computing device for distributing a file block in a distributed file system network that includes a plurality of data storage nodes, the computing device comprising:
- a memory configured to store data and processing instructions; and
  
  a processor configured to retrieve and execute the processing instructions stored in the memory to cause the processor to perform the steps of;
  
  identifying a first set of links, each link in the first set of links being from a node having the file block to another node in the distributed file system network;
  
  calculating a first set of link costs, each link cost in the first set of link costs being indicative of congestion on the associated link;
  
  calculating a first set of candidate pipeline costs for a first set of candidate pipelines, each candidate pipeline in the first set of candidate pipelines including a link in the first set of links and having an endpoint at the corresponding other node in the distributed file system network, each candidate pipeline cost in the first set of candidate pipeline costs being based on the corresponding link cost in the first set of link costs;
  
  selecting a pipeline from the first set of candidate pipelines based on the first set of candidate pipeline costs;
  
  storing, in a candidate pipeline store, information about the candidate pipelines in the set of candidate pipelines other than the selected pipeline; and
  
  iterativelyidentifying a set of immediate links;
  
  each link in the set of immediate links being from the endpoint of the selected pipeline to another node in the distributed file system network,calculating a set of link costs, each link cost in the set of link costs being indicative of congestion on the associated link,calculating a set of candidate pipeline costs for a set of candidate pipelines, each candidate pipeline in the set of candidate pipelines including the selected pipeline and a link in the set of immediate links and having an endpoint at the corresponding other node in the distributed file system network, each candidate pipeline cost in the set of candidate pipeline costs being based on the candidate pipeline cost of the selected pipeline and the corresponding link cost in the set of link costs,selecting a candidate pipeline from the set of candidate pipelines based on the calculated set of candidate pipeline costs,storing information about the unselected candidate pipelines in the set of candidate pipelines in the candidate pipeline store, andselecting a new selected pipeline for use in a subsequent iteration based at least in part on the candidate pipeline costs associated the selected candidate pipeline,until the endpoint of the selected pipeline is one of the plurality of data storage nodes.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The computing device of claim 11, wherein calculating the set of candidate pipeline costs includes, for each candidate pipeline in the set of candidate pipelines, calculating the maximum of the candidate pipeline cost of the selected pipeline and the corresponding link cost.
  - 13. The computing device of claim 11, wherein selecting the new selected pipeline for use in a subsequent iteration comprises, in the case that the cost associated with the selected candidate pipeline equals the cost associated with the current selected pipeline, selecting the selected candidate pipeline.
  - 14. The computing device of claim 11, wherein selecting the new selected pipeline for use in a subsequent iteration further comprises, in the case that the cost associated with the selected candidate pipeline is greater than the cost associated with the current selected pipeline, searching the candidate pipelines store for candidate pipelines with associated costs less than the cost associated with the selected candidate pipeline.
  - 15. The computing device of claim 14, wherein selecting the new selected pipeline for use in a subsequent iteration further comprises, in the case that the cost associated with the selected candidate pipeline is greater than the cost associated with the current selected pipeline and there are one or more pipelines in the candidate pipelines store with associated costs less than the cost associated with the selected candidate pipeline, selecting one of the one or more pipelines in the candidate pipelines store with associated costs less than the cost associated with the selected candidate pipeline.
  - 16. The computing device of claim 11, wherein the cost of a link is based on the number of active flows on that link.
  - 17. The computing device of claim 11, wherein selecting the candidate pipeline comprises determining the minimum cost in the set of candidate pipeline costs.
  - 18. The computing device of claim 11, further comprising back propagating pipeline costs from at least some of the plurality of data storage nodes to other nodes in the distributed file system network, and using the back propagated pipeline costs in translating the set of link costs.
  - 19. The computing device of claim 18, wherein back propagating pipeline costs comprises for a link ending one of the plurality of data storage nodes, determining a back-propagated link cost using the cost associated with the corresponding link;
    - for a network node, determining a back-propagated node cost using a minimum of the costs associated with link connect to the network node; and
      
      for a link not ending one of the plurality of data storage nodes, determining a back-propagated link cost using a maximum of the cost associated with the link and the back-propagated node cost associated with the corresponding network node.

20. A non-transitory computer readable medium storing instructions that, when executed by a processor, perform a method for use in distributing a file block in a distributed file system network that includes a plurality of data storage nodes, the method comprising:
- identifying a first set of links, each link in the first set of links being from a node having the file block to another node in the distributed file system network;
  
  calculating a first set of link costs, each link cost in the first set of link costs being indicative of congestion on the associated link;
  
  calculating a first set of candidate pipeline costs for a first set of candidate pipelines, each candidate pipeline in the first set of candidate pipelines including a link in the first set of links and having an endpoint at the corresponding other node in the distributed file system network, each candidate pipeline cost in the first set of candidate pipeline costs being based on the corresponding link cost in the first set of link costs;
  
  selecting a pipeline from the first set of candidate pipelines based on the first set of candidate pipeline costs;
  
  storing, in a candidate pipeline store, information about the candidate pipelines in the set of candidate pipelines other than the selected pipeline; and
  
  iterativelyidentifying a set of immediate links;
  
  each link in the set of immediate links being from the endpoint of the selected pipeline to another node in the distributed file system network,calculating a set of link costs, each link cost in the set of link costs being indicative of congestion on the associated link,calculating a set of candidate pipeline costs for a set of candidate pipelines, each candidate pipeline in the set of candidate pipelines including the selected pipeline and a link in the set of immediate links and having an endpoint at the corresponding other node in the distributed file system network, each candidate pipeline cost in the set of candidate pipeline costs being based on the candidate pipeline cost of the selected pipeline and the corresponding link cost in the set of link costs,selecting a candidate pipeline from the set of candidate pipelines based on the calculated set of candidate pipeline costs,storing information about the unselected candidate pipelines in the set of candidate pipelines in the candidate pipeline store, andselecting a new selected pipeline for use in a subsequent iteration based at least in part on the candidate pipeline costs associated the selected candidate pipeline,until the endpoint of the selected pipeline is one of the plurality of data storage nodes.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Taiwan Semiconductor Manufacturing Company Limited
Original Assignee
Wi-LAN Labs, Inc. (Wi-LAN Inc.)
Inventors
Gell, David, ElArabawy, Ahmed, Bao, Yiliang

Granted Patent

US 10,291,503 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/1844   Management specifically ada...

G06F 3/0605   by facilitating the interac...

G06F 3/064   Management of blocks

G06F 3/0643   Management of files

G06F 3/067   Distributed or networked st...

H04L 41/5025   by proactively reacting to ...

H04L 43/04   Processing captured monitor...

H04L 43/0882   Utilisation of link capacity

H04L 45/125   based on throughput or band...

H04L 47/127   by using congestion prediction

H04L 47/822   Collecting or measuring res...

H04L 67/1023   based on a hash applied to ...

H04L 67/1095   Replication or mirroring of...

H04L 67/1097   for distributed storage of ...

File Block Placement in a Distributed Network

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

66 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

File Block Placement in a Distributed Network

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

66 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others