Systems and methods for distributed system scanning
First Claim
1. A method for identifying selected attributes in files stored in a distributed file system, the method comprising:
- providing a plurality of nodes in a network, wherein each node comprises;
a processor and a memory device for locally storing data, and wherein files are distributed across the nodes such that one or more of the files are stored in the memory devices, in parts, among the plurality of nodes;
a plurality of metadata data blocks each associated with one of the files and comprising file attribute data related to the corresponding file, a file identifier, and location information for one or more content data blocks of the file, the attribute data including data indicating which nodes are used to store the file'"'"'s content data blocks, the metadata data blocks distributed across the nodes and stored in the memory devices among the plurality of nodes such that, for at least one of the metadata data blocks, at least one of the content data blocks of the file associated with the metadata data block is stored on a different node than the at least one metadata data block; and
a metadata map data structure providing an indication of where metadata data blocks are stored on the respective node and comprising a plurality of entries, each of the entries corresponding to a memory location of the memory device and indicating whether a metadata data block is stored in that memory location;
instructing, by the processor of the respective node, each of the nodes that locally stores metadata data blocks to determine each memory location where a metadata data block is locally stored using the respective node'"'"'s metadata map data structure, read the respective metadata data blocks, and search the locally stored metadata data blocks for files which include data blocks stored on a particular node that is unavailable such that one or more of the nodes performs at least a portion of the search in parallel with at least a portion of the search of one or more of the other nodes;
receiving from the nodes that store metadata data blocks, file identifiers related to files that include data blocks stored on the node that is unavailable;
accessing the metadata data blocks corresponding to one of the file identifiers to determine the location of at least one accessible content data block and at least one accessible parity data block corresponding to one of the files that include data blocks stored on an unavailable node;
reading the at least one accessible content data block and the at least one accessible parity data block from their respective locations in the memory devices of available nodes; and
processing the at least one accessible content data block and the at least one accessible parity data block to generate recovered data blocks corresponding to the one or more data blocks stored on the unavailable node,wherein processing the at least one accessible content data block and the at least one accessible parity data block comprises performing an exclusive or (XOR) operation on the at least one accessible content data block and the at least one accessible parity data block.
14 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are provided for scanning files and directories in a distributed file system on a network of nodes. The nodes include metadata with attribute information corresponding to files and directories distributed on the nodes. In one embodiment, the files and directories are scanned by commanding the nodes to search their respective metadata for a selected attribute. At least two of the nodes are capable of searching their respective metadata in parallel. In one embodiment, the distributed file system commands the nodes to search for metadata data structures having location information corresponding to a failed device on the network. The metadata data structures identified in the search may then be used to reconstruct lost data that was stored on the failed device.
341 Citations
9 Claims
-
1. A method for identifying selected attributes in files stored in a distributed file system, the method comprising:
-
providing a plurality of nodes in a network, wherein each node comprises; a processor and a memory device for locally storing data, and wherein files are distributed across the nodes such that one or more of the files are stored in the memory devices, in parts, among the plurality of nodes; a plurality of metadata data blocks each associated with one of the files and comprising file attribute data related to the corresponding file, a file identifier, and location information for one or more content data blocks of the file, the attribute data including data indicating which nodes are used to store the file'"'"'s content data blocks, the metadata data blocks distributed across the nodes and stored in the memory devices among the plurality of nodes such that, for at least one of the metadata data blocks, at least one of the content data blocks of the file associated with the metadata data block is stored on a different node than the at least one metadata data block; and a metadata map data structure providing an indication of where metadata data blocks are stored on the respective node and comprising a plurality of entries, each of the entries corresponding to a memory location of the memory device and indicating whether a metadata data block is stored in that memory location; instructing, by the processor of the respective node, each of the nodes that locally stores metadata data blocks to determine each memory location where a metadata data block is locally stored using the respective node'"'"'s metadata map data structure, read the respective metadata data blocks, and search the locally stored metadata data blocks for files which include data blocks stored on a particular node that is unavailable such that one or more of the nodes performs at least a portion of the search in parallel with at least a portion of the search of one or more of the other nodes; receiving from the nodes that store metadata data blocks, file identifiers related to files that include data blocks stored on the node that is unavailable; accessing the metadata data blocks corresponding to one of the file identifiers to determine the location of at least one accessible content data block and at least one accessible parity data block corresponding to one of the files that include data blocks stored on an unavailable node; reading the at least one accessible content data block and the at least one accessible parity data block from their respective locations in the memory devices of available nodes; and processing the at least one accessible content data block and the at least one accessible parity data block to generate recovered data blocks corresponding to the one or more data blocks stored on the unavailable node, wherein processing the at least one accessible content data block and the at least one accessible parity data block comprises performing an exclusive or (XOR) operation on the at least one accessible content data block and the at least one accessible parity data block. - View Dependent Claims (2, 3)
-
-
4. A system for identifying selected attributes in files stored in a distributed file system, the system comprising:
a plurality of nodes in a network, wherein each node comprises; a processor and a memory device for locally storing data, and wherein files are distributed across the nodes such that one or more of the files are stored in the memory devices, in parts, among the plurality of nodes; a plurality of metadata data blocks each associated with one of the files and comprising file attribute data related to the corresponding file, a file identifier, and location information for one or more content data blocks of the file, the attribute data including data indicating which nodes are used to store the file'"'"'s content data blocks, the metadata data blocks distributed across the nodes and stored in the memory devices among the plurality of nodes such that, for at least one of the metadata data blocks, at least one of the content data blocks of the file associated with the metadata data block is stored on a different node than the at least one metadata data block; and a metadata map data structure providing an indication of where metadata data blocks are stored on the respective node and comprising a plurality of entries, each of the entries corresponding to a memory location of the memory device and indicating whether a metadata data block is stored in that memory location, wherein each of the plurality of nodes are configured to; instruct each of the nodes that locally stores metadata data blocks to determine each memory location where a metadata data block is locally stored using the respective node'"'"'s metadata map data structure, read the respective metadata data blocks, and search the locally stored metadata data blocks for files which include data blocks stored on a particular node that is unavailable such that one or more of the nodes performs at least a portion of the search in parallel with at least a portion of the search of one or more of the other nodes; receive from the nodes that store metadata data blocks, file identifiers related to files that include data blocks stored on the node that is unavailable; access the metadata data blocks corresponding to one of the file identifiers to determine the location of at least one accessible content data block and at least one accessible parity data block corresponding to one of the files that include data blocks stored on an unavailable node; read the at least one accessible content data block and the at least one accessible parity data block from their respective locations in the memory devices of available nodes; and process the at least one accessible content data block and the at least one accessible parity data block to generate recovered data blocks corresponding to the one or more data blocks stored on the unavailable node by performing an exclusive—
or (XOR) operation on the at least one accessible content data block and the at least one accessible parity data block.- View Dependent Claims (5, 6)
-
7. A computer readable medium storing program code that, in response to execution by a processor of one of a plurality of nodes in a network, causes the processor to perform operations for identifying selected attributes in files stored in a distributed file system, the operations comprising:
-
instructing, by a processor of one of a plurality of nodes in a network wherein each node comprises; a processor and a memory device for locally storing data, and wherein files are distributed across the nodes such that one or more of the files are stored in the memory devices, in parts, among the plurality of nodes; and a plurality of metadata data blocks each associated with one of the files and comprising file attribute data related to the corresponding file, a file identifier, and location information for one or more content data blocks of the file, the attribute data including data indicating which nodes are used to store the file'"'"'s content data blocks, the metadata data blocks distributed across the nodes and stored in the memory devices among the plurality of nodes such that, for at least one of the metadata data blocks, at least one of the content data blocks of the file associated with the metadata data block is stored on a different node than the at least one metadata data block; and a metadata map data structure providing an indication of where metadata data blocks are stored on the respective node and comprising a plurality of entries, each of the entries corresponding to a memory location of the memory device and indicating whether a metadata data block is stored in that memory location, each of the nodes that locally stores metadata data blocks to determine each memory location where a metadata data block is locally stored using the respective node'"'"'s metadata map data structure, read the respective metadata data blocks, and search the locally stored metadata data blocks for files which include data blocks stored on a particular node that is unavailable such that one or more of the nodes performs at least a portion of the search in parallel with at least a portion of the search of one or more of the other nodes; receiving from the nodes that store metadata data blocks, file identifiers related to files that include data blocks stored on the node that is unavailable; accessing the metadata data blocks corresponding to one of the file identifiers to determine the location of at least one accessible content data block and at least one accessible parity data block corresponding to one of the files that include data blocks stored on an unavailable node; reading the at least one accessible content data block and the at least one accessible parity data block from their respective locations in the memory devices of available nodes; and processing the at least one accessible content data block and the at least one accessible parity data block to generate recovered data blocks corresponding to the one or more data blocks stored on the unavailable node, wherein processing the at least one accessible content data block and the at least one accessible parity data block comprises performing an exclusive—
or (XOR) operation on the at least one accessible content data block and the at least one accessible parity data block. - View Dependent Claims (8, 9)
-
Specification