Intelligent deduplication data prefetching
First Claim
1. A method, comprising:
- receiving a first input/output (I/O) request at a deduplication system node;
performing a first lookup in a cache before accessing a deduplication dictionary;
accessing the deduplication dictionary to determine a first location having data corresponding to the read I/O request;
maintaining in the cache a plurality of headers for a plurality of datastores related to the first location, wherein the cache is configured to maintain the plurality of headers for the plurality of datastores related to the first location upon receiving the read I/O request associated with a single datastore at the first location;
receiving a second I/O request at the deduplication system node;
performing a second lookup in cache before accessing the deduplication dictionary;
responding to the second I/O request without accessing the deduplication dictionary.
23 Assignments
0 Petitions
Accused Products
Abstract
Deduplication dictionaries are used to maintain data chunk identifier and location pairings in a deduplication system. When access to a particular data chunk is requested, a deduplication dictionary is accessed to determine the location of the data chunk and a datastore is accessed to retrieve the data chunk. However, deduplication dictionaries are large and typically maintained on disk, so dictionary access is expensive. Techniques and mechanisms of the present invention allow prefetches or read aheads of datastore (DS) headers. For example, if a dictionary hit results in datastore DS(X), then headers for DS(X+1), DS(X+2), DS(X+read-ahead-window) are prefetched ahead of time. These datastore headers are cached in memory, and indexed by datastore identifier. Before going to the dictionary, a lookup is first performed in the cached headers to reduce deduplication data access request latency.
-
Citations
20 Claims
-
1. A method, comprising:
-
receiving a first input/output (I/O) request at a deduplication system node; performing a first lookup in a cache before accessing a deduplication dictionary; accessing the deduplication dictionary to determine a first location having data corresponding to the read I/O request; maintaining in the cache a plurality of headers for a plurality of datastores related to the first location, wherein the cache is configured to maintain the plurality of headers for the plurality of datastores related to the first location upon receiving the read I/O request associated with a single datastore at the first location; receiving a second I/O request at the deduplication system node; performing a second lookup in cache before accessing the deduplication dictionary; responding to the second I/O request without accessing the deduplication dictionary. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system, comprising:
-
a hardware processor; an interface configured to receive an first input/output (I/O)request and a second (I/O) request at a deduplication system node; a deduplication dictionary configured to provide a first location having data corresponding to the first I/O request; a cache configured to maintain a plurality of headers for a plurality of datastores related to a first location after a first lookup is performed, wherein cache is configured to maintain the plurality of headers for the plurality of datastores related to the first location upon receiving the read I/O request associated with a single datastore at the first location, wherein the cache provides response data for the deduplication system node to respond to the second I/O request without accessing the deduplication dictionary. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A non-transitory computer readable medium, comprising:
-
computer code for receiving a first input/output (I/O) request at a deduplication system node; computer code for performing a first lookup in a cache before accessing a deduplication dictionary; computer code for accessing the deduplication dictionary to determine a first location having data corresponding to the read I/O request; computer code for maintaining in the cache a plurality of headers for a plurality of datastores related to the first location, wherein the cache is configured to maintain the plurality of headers for the plurality of datastores related to the first location upon receiving the read I/O request associated with a single datastore at the first location; computer code for receiving a second I/O request at the deduplication system node; computer code for performing a second lookup in cache before accessing the deduplication dictionary; computer code for responding to the second I/O request without accessing the deduplication dictionary. - View Dependent Claims (20)
-
Specification