×

Systems and methods for distributed system scanning

  • US 7,788,303 B2
  • Filed: 10/21/2005
  • Issued: 08/31/2010
  • Est. Priority Date: 10/21/2005
  • Status: Active Grant
First Claim
Patent Images

1. A method for identifying selected attributes in files stored in a distributed file system, the method comprising:

  • providing a plurality of nodes in a network, wherein each node comprises;

    a processor and a memory device for locally storing data, and wherein files are distributed across the nodes such that one or more of the files are stored in the memory devices, in parts, among the plurality of nodes;

    a plurality of metadata data blocks each associated with one of the files and comprising file attribute data related to the corresponding file, a file identifier, and location information for one or more content data blocks of the file, the attribute data including data indicating which nodes are used to store the file'"'"'s content data blocks, the metadata data blocks distributed across the nodes and stored in the memory devices among the plurality of nodes such that, for at least one of the metadata data blocks, at least one of the content data blocks of the file associated with the metadata data block is stored on a different node than the at least one metadata data block; and

    a metadata map data structure providing an indication of where metadata data blocks are stored on the respective node and comprising a plurality of entries, each of the entries corresponding to a memory location of the memory device and indicating whether a metadata data block is stored in that memory location;

    instructing, by the processor of the respective node, each of the nodes that locally stores metadata data blocks to determine each memory location where a metadata data block is locally stored using the respective node'"'"'s metadata map data structure, read the respective metadata data blocks, and search the locally stored metadata data blocks for files which include data blocks stored on a particular node that is unavailable such that one or more of the nodes performs at least a portion of the search in parallel with at least a portion of the search of one or more of the other nodes;

    receiving from the nodes that store metadata data blocks, file identifiers related to files that include data blocks stored on the node that is unavailable;

    accessing the metadata data blocks corresponding to one of the file identifiers to determine the location of at least one accessible content data block and at least one accessible parity data block corresponding to one of the files that include data blocks stored on an unavailable node;

    reading the at least one accessible content data block and the at least one accessible parity data block from their respective locations in the memory devices of available nodes; and

    processing the at least one accessible content data block and the at least one accessible parity data block to generate recovered data blocks corresponding to the one or more data blocks stored on the unavailable node,wherein processing the at least one accessible content data block and the at least one accessible parity data block comprises performing an exclusive or (XOR) operation on the at least one accessible content data block and the at least one accessible parity data block.

View all claims
  • 14 Assignments
Timeline View
Assignment View
    ×
    ×