Intelligent deduplication data prefetching
First Claim
1. A method, comprising:
- receiving a read input/output (I/O) request at a deduplication system node;
accessing a deduplication dictionary to determine a first location having data corresponding to the read I/O request;
caching headers for a plurality of datastores related to the first location, wherein cache is configured to cache the headers for the plurality of datastores related to the first location upon receiving the read I/O request associated with a single datastore at the first location;
receiving a subsequent read I/O request at the deduplication system node;
performing a lookup in cached headers before accessing any deduplication dictionary;
responding to the subsequent read I/O request.
17 Assignments
0 Petitions
Accused Products
Abstract
Deduplication dictionaries are used to maintain data chunk identifier and location pairings in a deduplication system. When access to a particular data chunk is requested, a deduplication dictionary is accessed to determine the location of the data chunk and a datastore is accessed to retrieve the data chunk. However, deduplication dictionaries are large and typically maintained on disk, so dictionary access is expensive. Techniques and mechanisms of the present invention allow prefetches or read aheads of datastore (DS) headers. For example, if a dictionary hit results in datastore DS(X), then headers for DS (X+1), DS (X+2), DS(X+read-ahead-window) are prefetched ahead of time. These datastore headers are cached in memory, and indexed by datastore identifier. Before going to the dictionary, a lookup is first performed in the cached headers to reduce deduplication data access request latency.
-
Citations
20 Claims
-
1. A method, comprising:
-
receiving a read input/output (I/O) request at a deduplication system node; accessing a deduplication dictionary to determine a first location having data corresponding to the read I/O request; caching headers for a plurality of datastores related to the first location, wherein cache is configured to cache the headers for the plurality of datastores related to the first location upon receiving the read I/O request associated with a single datastore at the first location; receiving a subsequent read I/O request at the deduplication system node; performing a lookup in cached headers before accessing any deduplication dictionary; responding to the subsequent read I/O request. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system, comprising:
-
an interface configured to receive a read input/output (I/O) request at a deduplication system node; a processor configured to access a deduplication dictionary to determine a first location having data corresponding to the read I/O request; cache configured to maintain headers for a plurality of datastores related to the first location, wherein cache is configured to cache the headers for the plurality of datastores related to the first location upon receiving the read I/O request associated with a single datastore at the first location; wherein the interface is operable to receive a subsequent read I/O request at the deduplication system node and respond to the subsequent read I/O request upon performing a lookup in cached headers before accessing any deduplication dictionary. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A system, comprising:
-
means for receiving a read input/output (I/O) request at a deduplication system node; means for accessing a deduplication dictionary to determine a first location having data corresponding to the read I/O request, wherein cache is configured to cache the headers for the plurality of datastores related to the first location upon receiving the read I/O request associated with a single datastore at the first location; means for caching headers for a plurality of datastores related to the first location; means for receiving a subsequent read I/O request at the deduplication system node; means for performing a lookup in cached headers before accessing any deduplication dictionary; means for responding to the subsequent read I/O request. - View Dependent Claims (20)
-
Specification