Deduplicated host cache flush to remote storage
First Claim
1. A method comprising:
- caching writes in a persistent cache of a host device, wherein the writes indicate write targets and data units to write;
after detection of a cache flush trigger, determining a first set of the data units that are each unique among the data units;
communicating the first set of data units to a remote, distributed storage system;
generating a command for each of a subset of the cached writes, wherein the command indicates a corresponding one of the first set of data units as a donor and the write target of the cached write of the subset as a recipient, wherein generating the command for each of the subset of the cached writes comprises generating a copy command in accordance with a network file system protocol and indicating in the copy command that a share should be performed instead of a copy if a front end device of the distributed storage system is capable of performing the share; and
communicating the commands to the remote, distributed storage system.
1 Assignment
0 Petitions
Accused Products
Abstract
In addition to caching I/O operations at a host, at least some data management can migrate to the host. With host side caching, data sharing or deduplication can be implemented with the cached writes before those writes are supplied to front end storage elements. When a host cache flush to distributed storage trigger is detected, the host deduplicates the cached writes. The host aggregates data based on the deduplication into a “change set file” (i.e., a file that includes the aggregation of unique data from the cached writes). The host supplies the change set file to the distributed storage system. The host then sends commands to the distributed storage system. Each of the commands identifies a part of the change set file to be used for a target of the cached writes.
31 Citations
19 Claims
-
1. A method comprising:
- caching writes in a persistent cache of a host device, wherein the writes indicate write targets and data units to write;
after detection of a cache flush trigger, determining a first set of the data units that are each unique among the data units;
communicating the first set of data units to a remote, distributed storage system;
generating a command for each of a subset of the cached writes, wherein the command indicates a corresponding one of the first set of data units as a donor and the write target of the cached write of the subset as a recipient, wherein generating the command for each of the subset of the cached writes comprises generating a copy command in accordance with a network file system protocol and indicating in the copy command that a share should be performed instead of a copy if a front end device of the distributed storage system is capable of performing the share; and
communicating the commands to the remote, distributed storage system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- caching writes in a persistent cache of a host device, wherein the writes indicate write targets and data units to write;
-
14. A non-transitory machine-readable medium comprising program code for efficient flushing of cached writes from a host device to a front end device of a storage system, the program code to:
- cache writes in a persistent cache of the host device, wherein the writes indicate write targets and data units to write;
after detection of a cache flush trigger, determine whether any of the data units have a fingerprint that matches an entry in a fingerprint map or a fingerprint of any of the other data units;
update a change set file to include any of the data units that have a fingerprint that does not match an entry in the fingerprint map and that does not match a fingerprint of any of the other data units;
update the fingerprint map to indicate any fingerprint of the data units that was unique among the data units and unique with respect to the fingerprint map;
for each of the write targets, create a mapping between the write target and a part of the change set file that matches the data unit associated with the write target;
communicate any part of the change set file to the storage system that has not yet been communicated to the storage system;
for each of a subset of the cached writes, generate a command that indicates the write target of the cached write as a recipient and that indicates the part of the change set file mapped to the write target as a donor, wherein the program code to generate the command comprises program code to generate a copy command in accordance with a network file system protocol and to indicate in the copy command that a share should be performed instead of a copy if a front end device of the distributed storage system is capable of performing the share; and
communicate the commands to the storage system. - View Dependent Claims (15, 16)
- cache writes in a persistent cache of the host device, wherein the writes indicate write targets and data units to write;
-
17. An apparatus comprising:
- a processor;
a persistent cache; and
a machine-readable medium having program code executable by the processor to cause the apparatus to, cache writes in the persistent cache, wherein the writes indicate write targets and data units to write;
after detection of a cache flush trigger, determine a first set of the data units that are each unique among the data units;
communicate the first set of data units to a remote, distributed storage system;
generate a command for each of a subset of the cached writes, wherein the command indicates a corresponding one of the first set of data units as a donor and the write target of the cached write of the subset as a recipient, wherein generating the command for each of the subset of the cached writes comprises generating a copy command in accordance with a network file system protocol and indicating in the copy command that a share should be performed instead of a copy if a front end device of the distributed storage system is capable of performing the share;
communicate the commands to the remote, distributed storage system. - View Dependent Claims (18, 19)
- a processor;
Specification