ZFS block-level deduplication at cloud scale
First Claim
1. A method of deduplicating data blocks on a cloud object store that is remote from a block storage system, the method comprising:
- receiving, at an application layer of the block storage system and through a system call interface of an interface layer of the block storage system, a first request to store or modify a file, the first request including file data;
generating, at a transactional object layer of the block storage system, a plurality of data blocks, each data block of the plurality of data blocks corresponding to at least a portion of the file data;
generating, at the transactional object layer of the block storage system and for each data block of the plurality of data blocks, a generated name using a naming protocol, the generated name being based on a content of the data block;
determining, at a data management unit, that a generated name of a first data block of the plurality of data blocks is equivalent to an existing name associated with an existing data block, the existing data block corresponding to an existing cloud storage object stored in the cloud object store, and the existing name generated using the naming protocol;
generating, at the transactional object layer of the block storage system, a set of data blocks, the set of data blocks including the plurality of data blocks while excluding the first data block;
generating, at the transactional object layer of the block storage system, a plurality of metadata blocks corresponding to the existing data block and each data block of the set of data blocks, the plurality of metadata blocks being configured to hierarchically point to lower-level blocks associated with the file and thereby correspond to at least part of a tree hierarchy for the file, wherein;
each metadata block of the plurality of metadata blocks includes one or more address pointers, each address pointer of the one or more address pointers pointing to the existing data block, a data block of the set of data blocks, or to a metadata block in the plurality of metadata blocks;
the plurality of metadata blocks includes a root block that is positioned at a top of the tree hierarchy for the file and one or more non-root metadata blocks;
each non-root metadata block of the plurality of metadata blocks being pointed to by at least one metadata block of the plurality of metadata blocks of the tree hierarchy for the file; and
each data block of the set of data blocks is pointed to by a metadata block of the plurality of metadata blocks of the tree hierarchy for the file;
causing a set of cloud storage objects to be stored in the cloud object store by transmitting the set of data blocks and the plurality of metadata blocks to a hybrid cloud storage system, the hybrid cloud storage system managing data storage in the cloud object store, wherein causing a set of cloud storage objects to be stored includes;
generating the set of cloud storage blocks based on the data blocks of the set of data blocks; and
generating, for each cloud storage object, an address pointer that points to the cloud storage object, the address pointer generated based on an identifier of the cloud storage object and a path specification of the cloud storage object;
transmitting, to the hybrid cloud storage system, one or more second requests for a set of addresses, each address of the set of addresses corresponding to a cloud storage object of the set of cloud storage objects that correspond to the set of data blocks;
receiving, from the hybrid cloud storage system, one or more responses to the one or more second requests, each response of the one or more responses identifying an address corresponding to a data block of the set of data blocks or a metadata block of the plurality of metadata blocks, the address identifying a storage location in the cloud object store; and
generating, using the tree hierarchy and the set of addresses, a mapping between each data block of the plurality of data blocks to a cloud storage object of the set of cloud storage objects.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques described herein relate to systems and methods of data storage, and more particularly to providing layering of file system functionality on an object interface. In certain embodiments, file system functionality may be layered on cloud object interfaces to provide cloud-based storage while allowing for functionality expected from a legacy applications. For instance, POSIX interfaces and semantics may be layered on cloud-based storage, while providing access to data in a manner consistent with file-based access with data organization in name hierarchies. Various embodiments also may provide for memory mapping of data so that memory map changes are reflected in persistent storage while ensuring consistency between memory map changes and writes. For example, by transforming a ZFS file system disk-based storage into ZFS cloud-based storage, the ZFS file system gains the elastic nature of cloud storage.
56 Citations
20 Claims
-
1. A method of deduplicating data blocks on a cloud object store that is remote from a block storage system, the method comprising:
-
receiving, at an application layer of the block storage system and through a system call interface of an interface layer of the block storage system, a first request to store or modify a file, the first request including file data; generating, at a transactional object layer of the block storage system, a plurality of data blocks, each data block of the plurality of data blocks corresponding to at least a portion of the file data; generating, at the transactional object layer of the block storage system and for each data block of the plurality of data blocks, a generated name using a naming protocol, the generated name being based on a content of the data block; determining, at a data management unit, that a generated name of a first data block of the plurality of data blocks is equivalent to an existing name associated with an existing data block, the existing data block corresponding to an existing cloud storage object stored in the cloud object store, and the existing name generated using the naming protocol; generating, at the transactional object layer of the block storage system, a set of data blocks, the set of data blocks including the plurality of data blocks while excluding the first data block; generating, at the transactional object layer of the block storage system, a plurality of metadata blocks corresponding to the existing data block and each data block of the set of data blocks, the plurality of metadata blocks being configured to hierarchically point to lower-level blocks associated with the file and thereby correspond to at least part of a tree hierarchy for the file, wherein; each metadata block of the plurality of metadata blocks includes one or more address pointers, each address pointer of the one or more address pointers pointing to the existing data block, a data block of the set of data blocks, or to a metadata block in the plurality of metadata blocks; the plurality of metadata blocks includes a root block that is positioned at a top of the tree hierarchy for the file and one or more non-root metadata blocks; each non-root metadata block of the plurality of metadata blocks being pointed to by at least one metadata block of the plurality of metadata blocks of the tree hierarchy for the file; and each data block of the set of data blocks is pointed to by a metadata block of the plurality of metadata blocks of the tree hierarchy for the file; causing a set of cloud storage objects to be stored in the cloud object store by transmitting the set of data blocks and the plurality of metadata blocks to a hybrid cloud storage system, the hybrid cloud storage system managing data storage in the cloud object store, wherein causing a set of cloud storage objects to be stored includes; generating the set of cloud storage blocks based on the data blocks of the set of data blocks; and generating, for each cloud storage object, an address pointer that points to the cloud storage object, the address pointer generated based on an identifier of the cloud storage object and a path specification of the cloud storage object; transmitting, to the hybrid cloud storage system, one or more second requests for a set of addresses, each address of the set of addresses corresponding to a cloud storage object of the set of cloud storage objects that correspond to the set of data blocks; receiving, from the hybrid cloud storage system, one or more responses to the one or more second requests, each response of the one or more responses identifying an address corresponding to a data block of the set of data blocks or a metadata block of the plurality of metadata blocks, the address identifying a storage location in the cloud object store; and generating, using the tree hierarchy and the set of addresses, a mapping between each data block of the plurality of data blocks to a cloud storage object of the set of cloud storage objects. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. One or more non-transitory tangible computer-readable storage media storing computer-executable instructions for deduplicating blocks on a cloud object store that is remote from a block storage system on a computing system, the computer-executable instructions, when executed by one or more processors, perform operations including:
-
receiving, at an application layer of the block storage system and through a system call interface of an interface layer of the block storage system, a first request to store or modify a file, the first request including file data; generating, at a transactional object layer of the block storage system, a plurality of data blocks, each data block of the plurality of data blocks corresponding to at least a portion of the file data; generating, at the transactional object layer of the block storage system and for each data block of the plurality of data blocks, a generated name using a naming protocol, the generated name being based on a content of the data block; determining, at a data management unit, that a generated name of a first data block of the plurality of data blocks is equivalent to an existing name associated with an existing data block, the existing data block corresponding to an existing cloud storage object stored in the cloud object store, and the existing name generated using the naming protocol; generating, at the transactional object layer of the block storage system, a set of data blocks, the set of data blocks including the plurality of data blocks while excluding the first data block; generating, at the transactional object layer of the block storage system, a plurality of metadata blocks corresponding to the existing data block and each data block of the set of data blocks, the plurality of metadata blocks being configured to hierarchically point to lower-level blocks associated with the file and thereby correspond to at least part of a tree hierarchy for the file, wherein; each metadata block of the plurality of metadata blocks includes one or more address pointers, each address pointer of the one or more address pointers being pointed to the existing data block, a data block of the set of data blocks, or to a metadata block in the plurality of metadata blocks; the plurality of metadata blocks includes a root block that is positioned at a top of the tree hierarchy for the file and one or more non-root metadata blocks; each non-root metadata block of the plurality of metadata blocks being pointed to by at least one metadata block of the plurality of metadata blocks of the tree hierarchy for the file; and each data block of the set of data blocks is pointed to by a metadata block of the plurality of metadata blocks of the tree hierarchy for the file; causing a set of cloud storage objects to be stored in the cloud object store by transmitting the data blocks of the set of data blocks and the plurality of metadata blocks to a hybrid cloud storage system, the hybrid cloud storage system managing data storage in the cloud object store, wherein causing a set of cloud storage objects to be stored includes; generating the set of cloud storage blocks based on the data blocks of the set of data blocks; and generating, for each cloud storage object, an address pointer that points to the cloud storage object, the address pointer generated based on an identifier of the cloud storage object and a path specification of the cloud storage object; transmitting, to the hybrid cloud storage system, one or more second requests for a set of addresses, each address of the set of addresses corresponding to a cloud storage object of the set of cloud storage objects that correspond to the set of data blocks; and receiving, from the hybrid cloud storage system, one or more responses to the one or more second requests, each response of the one or more responses identifying an address corresponding to a data block of the set of data blocks or a metadata block of the plurality of metadata blocks, the address identifying a storage location in the cloud object store; and generating, using the tree hierarchy and the set of addresses, a mapping between each data block of the plurality of data blocks to a cloud storage object of the set of cloud storage objects. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A processor-based system for deduplicating data blocks on a cloud object store that is remote from a block storage system, the processor-based system performing operations including:
-
receiving, at an application layer of the block storage system and through a system call interface of an interface layer of the block storage system, a first request to store or modify a file, the first request including file data; generating, at a transactional object layer of the block storage system, a plurality of data blocks, each data block of the plurality of data blocks corresponding to at least a portion of the file data; generating, at the transactional object layer of the block storage system, a generated name using a naming protocol, the generated name being based on a content of the data block; determining, at a data management unit, that a generated name of a first data block of the plurality of data blocks is equivalent to an existing name associated with an existing data block, the existing data block corresponding to a cloud storage block stored in the cloud object store, and the existing name generated using the naming protocol; generating, at the transactional object layer of the block storage system, a set of data blocks, the set of data blocks including the plurality of data blocks while excluding the first data block; generating, at the transactional object layer of the block storage system, a plurality of metadata blocks corresponding to the existing data block and each data block of the set of data blocks, the plurality of metadata blocks being configured to hierarchically point to lower-level blocks associated with the file and thereby correspond to at least part of a tree hierarchy for the file, wherein; each metadata block of the plurality of metadata blocks includes one or more address pointers, each address pointer of the one or more address pointers pointing to the existing data block, a data block of the set of data blocks, or to a metadata block of the plurality of metadata blocks; the plurality of metadata blocks includes a root block that is positioned at a top of the tree hierarchy for the file and one or more non-root metadata blocks; each non-root metadata block of the plurality of metadata blocks being pointed to by at least one metadata block of the plurality of metadata blocks of the tree hierarchy for the file; and each data block of the set of data blocks is pointed to by a metadata block of the plurality of metadata blocks of the tree hierarchy for the file; causing a set of cloud storage objects to be stored in the cloud object store by transmitting the data blocks of the set of data blocks and the plurality of metadata blocks to a hybrid cloud storage system, the hybrid cloud storage system managing data storage in the cloud object store, wherein causing a set of cloud storage objects to be stored includes; generating the set of cloud storage blocks based on the data blocks of the set of data blocks; and generating, for each cloud storage object, an address pointer that points to the cloud storage object, the address pointer generated based on an identifier of the cloud storage object and a path specification of the cloud storage object; transmitting, to the hybrid cloud storage system, one or more second requests for a set of addresses, each address of the set of addresses corresponding to a loud storage object of the set of cloud storage objects that correspond to the set of data blocks; and receiving, from the hybrid cloud storage system, one or more responses to the one or more second requests, each response of the one or more responses identifying an address corresponding to a data block of the set of data blocks or a metadata block of the plurality of metadata blocks, the address identifying a storage location in the cloud object store; and generating, using the tree hierarchy and the set of addresses, a mapping between each data block of the plurality of data blocks to a cloud storage object of the set of cloud storage objects. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification