Shallow cache for content replication
First Claim
1. A method performed by a computing device comprising processing hardware and storage hardware, the method comprising:
- storing, in the storage hardware, a local data store comprised of local data items having respective unique storage paths where they are stored in the local data store;
maintaining a shallow cache comprised of cache entries, each cache entry representing a respective local data item in the local data store, each cache entry comprising a hash of the correspondingly represented local data item and comprising a set of paths where respective copies of the local data item are stored in the local data store;
determining to add a dataset from a remote device to the data store, the dataset represented by a manifest comprising a list of manifest entries, each manifest entry comprising a hash of a respective remote data item at the remote device and a set of paths where respective copies of the remote data item are stored on the remote device;
based on determining to add the dataset from the remote device to the data store, adding the dataset by;
for each manifest entry in the manifest, determining whether any cache entry contains the hash of the manifest entry; and
when a cache entry is determined to contain a hash of a manifest entry, copying a local data item (i) from a path in the cache entry'"'"'s set of paths (ii) to each of the paths in the manifest entry'"'"'s set of paths.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments relate to efficiently replicating data from a source storage space to a target storage space. The storage spaces share a common namespace of paths where content units are stored. A shallow cache is maintained for the target storage space. Each entry in the cache includes a hash of a content unit in the target storage space and associated hierarchy paths in the target storage space where the corresponding content unit is stored. When a set of content units in the source storage space is to be replicated at the target storage space, any content unit with a hash in the cache is replicated from one of the associated paths in the cache, thus avoiding having to replicate content from the source storage space.
-
Citations
20 Claims
-
1. A method performed by a computing device comprising processing hardware and storage hardware, the method comprising:
-
storing, in the storage hardware, a local data store comprised of local data items having respective unique storage paths where they are stored in the local data store; maintaining a shallow cache comprised of cache entries, each cache entry representing a respective local data item in the local data store, each cache entry comprising a hash of the correspondingly represented local data item and comprising a set of paths where respective copies of the local data item are stored in the local data store; determining to add a dataset from a remote device to the data store, the dataset represented by a manifest comprising a list of manifest entries, each manifest entry comprising a hash of a respective remote data item at the remote device and a set of paths where respective copies of the remote data item are stored on the remote device; based on determining to add the dataset from the remote device to the data store, adding the dataset by; for each manifest entry in the manifest, determining whether any cache entry contains the hash of the manifest entry; and when a cache entry is determined to contain a hash of a manifest entry, copying a local data item (i) from a path in the cache entry'"'"'s set of paths (ii) to each of the paths in the manifest entry'"'"'s set of paths. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computing device comprising:
-
processing hardware configured to interoperate with storage hardware; the storage hardware, the storage hardware storing instructions configured to, when executed by the processing hardware, cause the computing device to perform a process comprising; providing a content-addressable storage that stores a hierarchy of files, each file stored at a respective unique path in the hierarchy of files, wherein instances of same files are stored at different respective unique paths in the hierarchy of files; maintaining a shallow cache comprised of hashes of files in the hierarchy of files, the shallow cache further comprised of path lists associated with the hashes, respectively, each path list associated with a respective hash comprising a list of one or more paths in the hierarchy of files storing one or more respective instances of the file corresponding to the hash, wherein at least some of the path lists comprise multiple paths; and adding files to respective new paths in the content-addressable storage by, each time a file is to be added to one or more new paths in the content-addressable storage, determining if a hash of the file is present in the shallow cache, wherein when a file'"'"'s hash is determined to be present in the shallow cache the file is copied from a path in the corresponding path list to each of the corresponding new paths. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. Computer-readable storage hardware storing instructions configured to cause a computing device to perform a process, the storage hardware not comprising a signal, the process comprising:
-
maintaining a storage comprised of content units, each content unit stored at a full path, each full path belonging to a same namespace, wherein some of the full paths contain instances of a same respective content unit; maintaining a location cache that indicates which content units are stored at which full paths in the storage, wherein, given an arbitrary content unit, full paths where respective instances of the arbitrary content unit are stored in the storage can be obtained based on a hash of the arbitrary content unit; receiving a first request to add a first target content unit to the storage at first target full paths in the namespace; responding to the first request by obtaining a first target hash of the first target content unit, obtaining a source full path from the location cache based on the target hash, and using the obtained source full path to copy a content unit at the source full path to each of the target full paths. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification