Systems and methods for distributed system scanning

US 7,788,303 B2
Filed: 10/21/2005
Issued: 08/31/2010
Est. Priority Date: 10/21/2005
Status: Active Grant

First Claim

Patent Images

1. A method for identifying selected attributes in files stored in a distributed file system, the method comprising:

providing a plurality of nodes in a network, wherein each node comprises;

a processor and a memory device for locally storing data, and wherein files are distributed across the nodes such that one or more of the files are stored in the memory devices, in parts, among the plurality of nodes;

a plurality of metadata data blocks each associated with one of the files and comprising file attribute data related to the corresponding file, a file identifier, and location information for one or more content data blocks of the file, the attribute data including data indicating which nodes are used to store the file'"'"'s content data blocks, the metadata data blocks distributed across the nodes and stored in the memory devices among the plurality of nodes such that, for at least one of the metadata data blocks, at least one of the content data blocks of the file associated with the metadata data block is stored on a different node than the at least one metadata data block; and

a metadata map data structure providing an indication of where metadata data blocks are stored on the respective node and comprising a plurality of entries, each of the entries corresponding to a memory location of the memory device and indicating whether a metadata data block is stored in that memory location;

instructing, by the processor of the respective node, each of the nodes that locally stores metadata data blocks to determine each memory location where a metadata data block is locally stored using the respective node'"'"'s metadata map data structure, read the respective metadata data blocks, and search the locally stored metadata data blocks for files which include data blocks stored on a particular node that is unavailable such that one or more of the nodes performs at least a portion of the search in parallel with at least a portion of the search of one or more of the other nodes;

receiving from the nodes that store metadata data blocks, file identifiers related to files that include data blocks stored on the node that is unavailable;

accessing the metadata data blocks corresponding to one of the file identifiers to determine the location of at least one accessible content data block and at least one accessible parity data block corresponding to one of the files that include data blocks stored on an unavailable node;

reading the at least one accessible content data block and the at least one accessible parity data block from their respective locations in the memory devices of available nodes; and

processing the at least one accessible content data block and the at least one accessible parity data block to generate recovered data blocks corresponding to the one or more data blocks stored on the unavailable node,wherein processing the at least one accessible content data block and the at least one accessible parity data block comprises performing an exclusive or (XOR) operation on the at least one accessible content data block and the at least one accessible parity data block.

View all claims

14 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods are provided for scanning files and directories in a distributed file system on a network of nodes. The nodes include metadata with attribute information corresponding to files and directories distributed on the nodes. In one embodiment, the files and directories are scanned by commanding the nodes to search their respective metadata for a selected attribute. At least two of the nodes are capable of searching their respective metadata in parallel. In one embodiment, the distributed file system commands the nodes to search for metadata data structures having location information corresponding to a failed device on the network. The metadata data structures identified in the search may then be used to reconstruct lost data that was stored on the failed device.

341 Citations

9 Claims

1. A method for identifying selected attributes in files stored in a distributed file system, the method comprising:
- providing a plurality of nodes in a network, wherein each node comprises;
  
  a processor and a memory device for locally storing data, and wherein files are distributed across the nodes such that one or more of the files are stored in the memory devices, in parts, among the plurality of nodes;
  
  a plurality of metadata data blocks each associated with one of the files and comprising file attribute data related to the corresponding file, a file identifier, and location information for one or more content data blocks of the file, the attribute data including data indicating which nodes are used to store the file'"'"'s content data blocks, the metadata data blocks distributed across the nodes and stored in the memory devices among the plurality of nodes such that, for at least one of the metadata data blocks, at least one of the content data blocks of the file associated with the metadata data block is stored on a different node than the at least one metadata data block; and
  
  a metadata map data structure providing an indication of where metadata data blocks are stored on the respective node and comprising a plurality of entries, each of the entries corresponding to a memory location of the memory device and indicating whether a metadata data block is stored in that memory location;
  
  instructing, by the processor of the respective node, each of the nodes that locally stores metadata data blocks to determine each memory location where a metadata data block is locally stored using the respective node'"'"'s metadata map data structure, read the respective metadata data blocks, and search the locally stored metadata data blocks for files which include data blocks stored on a particular node that is unavailable such that one or more of the nodes performs at least a portion of the search in parallel with at least a portion of the search of one or more of the other nodes;
  
  receiving from the nodes that store metadata data blocks, file identifiers related to files that include data blocks stored on the node that is unavailable;
  
  accessing the metadata data blocks corresponding to one of the file identifiers to determine the location of at least one accessible content data block and at least one accessible parity data block corresponding to one of the files that include data blocks stored on an unavailable node;
  
  reading the at least one accessible content data block and the at least one accessible parity data block from their respective locations in the memory devices of available nodes; and
  
  processing the at least one accessible content data block and the at least one accessible parity data block to generate recovered data blocks corresponding to the one or more data blocks stored on the unavailable node,wherein processing the at least one accessible content data block and the at least one accessible parity data block comprises performing an exclusive or (XOR) operation on the at least one accessible content data block and the at least one accessible parity data block.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1, further comprising:
    - restriping the recovered data blocks among available nodes.
  - 3. The method of claim 1, wherein each of the respective nodes performs the search by sequentially traversing its memory device to read its respective metadata data blocks after determining which memory locations store metadata data blocks.

4. A system for identifying selected attributes in files stored in a distributed file system, the system comprising:
- a plurality of nodes in a network, wherein each node comprises;
  
  a processor and a memory device for locally storing data, and wherein files are distributed across the nodes such that one or more of the files are stored in the memory devices, in parts, among the plurality of nodes;
  
  a plurality of metadata data blocks each associated with one of the files and comprising file attribute data related to the corresponding file, a file identifier, and location information for one or more content data blocks of the file, the attribute data including data indicating which nodes are used to store the file'"'"'s content data blocks, the metadata data blocks distributed across the nodes and stored in the memory devices among the plurality of nodes such that, for at least one of the metadata data blocks, at least one of the content data blocks of the file associated with the metadata data block is stored on a different node than the at least one metadata data block; and
  
  a metadata map data structure providing an indication of where metadata data blocks are stored on the respective node and comprising a plurality of entries, each of the entries corresponding to a memory location of the memory device and indicating whether a metadata data block is stored in that memory location, wherein each of the plurality of nodes are configured to;
  
  instruct each of the nodes that locally stores metadata data blocks to determine each memory location where a metadata data block is locally stored using the respective node'"'"'s metadata map data structure, read the respective metadata data blocks, and search the locally stored metadata data blocks for files which include data blocks stored on a particular node that is unavailable such that one or more of the nodes performs at least a portion of the search in parallel with at least a portion of the search of one or more of the other nodes;
  
  receive from the nodes that store metadata data blocks, file identifiers related to files that include data blocks stored on the node that is unavailable;
  
  access the metadata data blocks corresponding to one of the file identifiers to determine the location of at least one accessible content data block and at least one accessible parity data block corresponding to one of the files that include data blocks stored on an unavailable node;
  
  read the at least one accessible content data block and the at least one accessible parity data block from their respective locations in the memory devices of available nodes; and
  
  process the at least one accessible content data block and the at least one accessible parity data block to generate recovered data blocks corresponding to the one or more data blocks stored on the unavailable node by performing an exclusive—
  
  or (XOR) operation on the at least one accessible content data block and the at least one accessible parity data block.
- View Dependent Claims (5, 6)
- - 5. The system of claim 4, wherein each of the plurality of nodes is further configured to:
    - restripe the recovered data blocks among available nodes.
  - 6. The system of claim 4, wherein each of the respective nodes performs the search by sequentially traversing its memory device to read its respective metadata data blocks after determining which memory locations store metadata data blocks.

7. A computer readable medium storing program code that, in response to execution by a processor of one of a plurality of nodes in a network, causes the processor to perform operations for identifying selected attributes in files stored in a distributed file system, the operations comprising:
- instructing, by a processor of one of a plurality of nodes in a network wherein each node comprises;
  
  a processor and a memory device for locally storing data, and wherein files are distributed across the nodes such that one or more of the files are stored in the memory devices, in parts, among the plurality of nodes; and
  
  a plurality of metadata data blocks each associated with one of the files and comprising file attribute data related to the corresponding file, a file identifier, and location information for one or more content data blocks of the file, the attribute data including data indicating which nodes are used to store the file'"'"'s content data blocks, the metadata data blocks distributed across the nodes and stored in the memory devices among the plurality of nodes such that, for at least one of the metadata data blocks, at least one of the content data blocks of the file associated with the metadata data block is stored on a different node than the at least one metadata data block; and
  
  a metadata map data structure providing an indication of where metadata data blocks are stored on the respective node and comprising a plurality of entries, each of the entries corresponding to a memory location of the memory device and indicating whether a metadata data block is stored in that memory location,each of the nodes that locally stores metadata data blocks to determine each memory location where a metadata data block is locally stored using the respective node'"'"'s metadata map data structure, read the respective metadata data blocks, and search the locally stored metadata data blocks for files which include data blocks stored on a particular node that is unavailable such that one or more of the nodes performs at least a portion of the search in parallel with at least a portion of the search of one or more of the other nodes;
  
  receiving from the nodes that store metadata data blocks, file identifiers related to files that include data blocks stored on the node that is unavailable;
  
  accessing the metadata data blocks corresponding to one of the file identifiers to determine the location of at least one accessible content data block and at least one accessible parity data block corresponding to one of the files that include data blocks stored on an unavailable node;
  
  reading the at least one accessible content data block and the at least one accessible parity data block from their respective locations in the memory devices of available nodes; and
  
  processing the at least one accessible content data block and the at least one accessible parity data block to generate recovered data blocks corresponding to the one or more data blocks stored on the unavailable node,wherein processing the at least one accessible content data block and the at least one accessible parity data block comprises performing an exclusive—
  
  or (XOR) operation on the at least one accessible content data block and the at least one accessible parity data block.
- View Dependent Claims (8, 9)
- - 8. The computer-readable medium of claim 7, wherein the program code is further configured to cause the processor to perform operations comprising:
    - restriping the recovered data blocks among available nodes.
  - 9. The computer-readable medium of claim 7, wherein each of the respective nodes performs the search by sequentially traversing its memory device to read its respective metadata data blocks after determining which memory locations store metadata data blocks.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Emc IP Holding Company LLC (Dell Technologies Inc.)
Original Assignee
Isilon Systems Incorporated (Dell Technologies Inc.)
Inventors
Dire, Nathan E., Schack, Darren P., Godman, Peter J., Anderson, Robert J., Mikesell, Paul A.
Primary Examiner(s)
Ali; Mohammad
Assistant Examiner(s)
Shmatov; Alexey

Application Number

US11/255,817
Publication Number

US 20070094269A1
Time in Patent Office

1,775 Days
Field of Search

707/10, 707/2
US Class Current

707/828
CPC Class Codes

G06F 11/1435   using file system or storag...

G06F 11/1464   for networked environments

G06F 16/134   Distributed indices

G06F 16/1834   implemented based on peer-t...

Systems and methods for distributed system scanning

First Claim

14 Assignments

0 Petitions

Accused Products

Abstract

341 Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for distributed system scanning

First Claim

14 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

341 Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links