DEDUPLICATED HOST CACHE FLUSH TO REMOTE STORAGE
First Claim
1. A method comprising:
- caching writes in a persistent cache of a host device, wherein the writes indicate write targets and data units to write;
after detection of a cache flush trigger,determining a first set of the data units that are each unique among the data units;
communicating the first set of data units to a remote, distributed storage system;
generating a command for each of a subset of the cached writes, wherein the command indicates a corresponding one of the first set of data units as a donor and the write target of the cached write of the subset as a recipient; and
communicating the commands to the remote, distributed storage system.
1 Assignment
0 Petitions
Accused Products
Abstract
In addition to caching I/O operations at a host, at least some data management can migrate to the host. With host side caching, data sharing or deduplication can be implemented with the cached writes before those writes are supplied to front end storage elements. When a host cache flush to distributed storage trigger is detected, the host deduplicates the cached writes. The host aggregates data based on the deduplication into a “change set file” (i.e., a file that includes the aggregation of unique data from the cached writes). The host supplies the change set file to the distributed storage system. The host then sends commands to the distributed storage system. Each of the commands identifies a part of the change set file to be used for a target of the cached writes.
23 Citations
20 Claims
-
1. A method comprising:
-
caching writes in a persistent cache of a host device, wherein the writes indicate write targets and data units to write; after detection of a cache flush trigger, determining a first set of the data units that are each unique among the data units; communicating the first set of data units to a remote, distributed storage system; generating a command for each of a subset of the cached writes, wherein the command indicates a corresponding one of the first set of data units as a donor and the write target of the cached write of the subset as a recipient; and communicating the commands to the remote, distributed storage system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory machine-readable medium comprising program code for efficient flushing of cached writes from a host device to a front end device of a storage system, the program code to:
-
cache writes in a persistent cache of the host device, wherein the writes indicate write targets and data units to write; after detection of a cache flush trigger, determine whether any of the data units have a fingerprint that matches an entry in a fingerprint map or a fingerprint of any of the other data units; update a change set file to include any of the data units that have a fingerprint that does not match an entry in the fingerprint map and that does not match a fingerprint of any of the other data units; update the fingerprint map to indicate any fingerprint of the data units that was unique among the data units and unique with respect to the fingerprint map; for each of the write targets, create a mapping between the write target and a part of the change set file that matches the data unit associated with the write target; communicate any part of the change set file to the storage system that has not yet been communicated to the storage system; for each of a subset of the cached writes, generate a command that indicates the write target of the cached write as a recipient and that indicates the part of the change set file mapped to the write target as a donor; and communicate the commands to the storage system. - View Dependent Claims (16, 17)
-
-
18. An apparatus comprising:
-
a processor; a persistent cache; and a machine-readable medium having program code executable by the processor to cause the apparatus to, cache writes in the persistent cache, wherein the writes indicate write targets and data units to write; after detection of a cache flush trigger, determine a first set of the data units that are each unique among the data units; communicate the first set of data units to a remote, distributed storage system; generate a command for each of a subset of the cached writes, wherein the command indicates a corresponding one of the first set of data units as a donor and the write target of the cached write of the subset as a recipient; communicate the commands to the remote, distributed storage system. - View Dependent Claims (19, 20)
-
Specification