On-disk file format for a serverless distributed file system
First Claim
1. A method implemented at least in part by a machine comprising:
- segmenting a sparse file into multiple blocks;
differentiating non-data blocks in the sparse file that contain no substantive content from data blocks in the sparse file that contain substantive data;
creating an indexing structure to index individual blocks; and
deallocating storage of both the non-data blocks and portions of the indexing structure that reference the non-data blocks that contain no substantive content, wherein the sparse file and the indexinci structure are reduced in size.
1 Assignment
0 Petitions
Accused Products
Abstract
A file format for a serverless distributed file system is composed of two parts: a primary data stream and a metadata stream. The data stream contains a file that is divided into multiple blocks. Each block is encrypted using a hash of the block as the encryption key. The metadata stream contains a header, a structure for indexing the encrypted blocks in the primary data stream, and some user information. The indexing structure defines leaf nodes for each of the blocks. Each leaf node consists of an access value used for decryption of the associated block and a verification value used to verify the encrypted block independently of other blocks. In one implementation, the access value is formed by hashing the file block and encrypting the resultant hash value using a randomly generated key. The key is then encrypted using the user'"'"'s key as the encryption key. The verification value is formed by hashing the associated encrypted block using a one-way hash function. The file format supports verification of individual file blocks without knowledge of the randomly generated key or any user keys. To verify a block of the file, the file system traverses the tree to the appropriate leaf node associated with a target block to be verified. The file system hashes the target block and if the hash matches the access value contained in the leaf node, the block is authentic.
-
Citations
17 Claims
-
1. A method implemented at least in part by a machine comprising:
-
segmenting a sparse file into multiple blocks; differentiating non-data blocks in the sparse file that contain no substantive content from data blocks in the sparse file that contain substantive data;
creating an indexing structure to index individual blocks; and
deallocating storage of both the non-data blocks and portions of the indexing structure that reference the non-data blocks that contain no substantive content, wherein the sparse file and the indexinci structure are reduced in size. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method comprising:
-
segmenting a sparse file into multiple blocks, the sparse file containing at least one non-data block that contains no substantive data; differentiating the non-data blocks from data blocks of the sparse file that contain substantive data; computing hashes of each of the data blocks to produce block hash values; encrypting the data blocks using their corresponding block hash values as encryption keys to produce encrypted data blocks; creating an indexing structure to index individual blocks, the indexing structure containing first leaf nodes for each corresponding encrypted data block and second leaf nodes for each corresponding non-data block, the first leaf nodes containing an access value formed by encrypting the block hash value for the corresponding encrypted block using an access key and a verification value formed by hashing the corresponding encrypted block; and setting the second leaf nodes to a first binary value. - View Dependent Claims (7, 8, 9)
-
-
10. One or more computer readable storage media comprising computer-executable instructions that, when executed, direct a computing device to:
-
segment a sparse file into multiple blocks, the sparse file containing at least one non-data block that contains no substantive data; differentiate the non-data blocks from data blocks of the sparse file that contain substantive data; compute hashes of each of the data blocks to produce block hash values; encrypt the data blocks using their corresponding block hash values as encryption keys to produce encrypted data blocks; creating an indexing structure to index the non-data blocks and the encrypted data blocks; and deallocate portions of the indexing structure that reference the non-data blocks.
-
-
11. A component in a device in a distributed file system in which files are stored across multiple distributed computers, the component comprising:
-
a segmenting module to divide a sparse file into multiple blocks, the sparse file containing at least one non-data block that contains no substantive data; a control module to differentiate the non-data blocks from data blocks of the sparse file that contain substantive data; a hash module to hash each of the data blocks to produce block hash values; a cryptographic engine to encrypt the data blocks using their corresponding block hash values as encryption keys to produce encrypted blocks; and an index builder to create an indexing structure to index individual blocks, the indexing structure containing first leaf nodes for each corresponding encrypted block and second leaf nodes for each corresponding non-data block, the first leaf nodes containing an access value formed by encrypting the block hash value for the corresponding encrypted block using an access key and a verification value formed by hashing the corresponding encrypted block, the second leaf nodes being set to a first binary value. - View Dependent Claims (12, 13)
-
-
14. A method implemented at least in part by a machine comprising:
-
segmenting a sparse file into multiple blocks; differentiating non-data blocks in the sparse file that contain no substantive content from data blocks in the sparse file that contain substantive data; creating an indexing structure to index individual blocks; deallocating storage of both the non-data blocks and portions of the indexing structure that reference the non-data blocks that contain no substantive data, wherein the sparse file is reduced in size; computing a hash of each of the data blocks to produce block hash values; and encrypting the data blocks using their corresponding block hash values as encryption keys to produce encrypted data blocks. - View Dependent Claims (15, 16, 17)
-
Specification