System and method for high performance deduplication indexing
First Claim
1. A computer system comprising:
- a server comprising a processor and a memory;
an index comprising a plurality of blocks, wherein each block comprises a plurality of entries, each entry including a fingerprint and an identifier which identifies where a corresponding data segment is stored within a data storage subsystem;
wherein in response to receiving a storage access request corresponding to a given fingerprint, the server is configured to;
access a data structure comprising a plurality of entries, wherein each entry of the plurality of entries identifies a block of the plurality of blocks, and a fingerprint corresponding to a first entry of the block;
responsive to said access of the data structure, identify a particular block of the plurality of blocks corresponding to the storage access request; and
responsive to identifying the particular block, send a query correspondingto the particular block to the index;
wherein in response to the query, the index is configured to convey a response to the server that indicates whether the given fingerprint is included within the particular block;
wherein the plurality of blocks of the index are sorted based on a value of an entry in a given block, and wherein responsive to said access of the data structure the particular block identified is one of;
a block in the index whose first entry includes a fingerprint value that is a smallest value of all first entry fingerprint values that are greater than the given fingerprint; and
a block with a first entry including a fingerprint value that is a largest value of all first entry fingerprint values that are less than the given fingerprint.
7 Assignments
0 Petitions
Accused Products
Abstract
A system and method for efficiently reducing latency of accessing an index for a data segment stored on a server. A server both removes duplicate data and prevents duplicate data from being stored in a shared data storage. The file server is coupled to an index storage subsystem holding fingerprint and pointer value pairs corresponding to a data segment stored in the shared data storage. The pairs are stored in a predetermined order. The file server utilizes an ordered binary search tree to identify a particular block of multiple blocks within the index storage subsystem corresponding to a received memory access request. The index storage subsystem determines whether an entry corresponding to the memory access request is located within the identified block. Based on at least this determination, the file server processes the memory access request accordingly. In one embodiment, the index storage subsystem is a solid-state disk (SSD).
192 Citations
17 Claims
-
1. A computer system comprising:
-
a server comprising a processor and a memory; an index comprising a plurality of blocks, wherein each block comprises a plurality of entries, each entry including a fingerprint and an identifier which identifies where a corresponding data segment is stored within a data storage subsystem; wherein in response to receiving a storage access request corresponding to a given fingerprint, the server is configured to; access a data structure comprising a plurality of entries, wherein each entry of the plurality of entries identifies a block of the plurality of blocks, and a fingerprint corresponding to a first entry of the block; responsive to said access of the data structure, identify a particular block of the plurality of blocks corresponding to the storage access request; and responsive to identifying the particular block, send a query corresponding to the particular block to the index; wherein in response to the query, the index is configured to convey a response to the server that indicates whether the given fingerprint is included within the particular block; wherein the plurality of blocks of the index are sorted based on a value of an entry in a given block, and wherein responsive to said access of the data structure the particular block identified is one of; a block in the index whose first entry includes a fingerprint value that is a smallest value of all first entry fingerprint values that are greater than the given fingerprint; and a block with a first entry including a fingerprint value that is a largest value of all first entry fingerprint values that are less than the given fingerprint. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer implemented method comprising:
-
maintaining an index comprising a plurality of blocks, wherein each block comprises a plurality of entries, each entry including a fingerprint and an identifier which identifies where a corresponding data segment is stored within a data storage subsystem; receiving a memory access request at a server; accessing a data structure comprising a plurality of entries, wherein each entry of the plurality of entries identifies a block of the plurality of blocks, and a fingerprint corresponding to a first entry of the block; responsive to said access of the data structure, identifying a particular block of the plurality of blocks corresponding to the storage access request; responsive to identifying the particular block the server sending a query corresponding to the particular block to the index; the index conveying a response to the server that indicates whether the given fingerprint is included within the particular block, responsive to the query; wherein the plurality of blocks of the index are sorted based on a value of an entry in a given block, and wherein responsive to said accessing of the data structure the particular block identified is one of; a block in the index whose first entry includes a fingerprint value that is a smallest value of all first entry fingerprint values that are greater than the given fingerprint; and a block with a first entry including a fingerprint value that is a largest value of all first entry fingerprint values that are less than the given fingerprint. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable medium storing program instructions that are executable to:
-
maintain an index comprising a plurality of blocks, wherein each block comprises a plurality of entries, each entry including a fingerprint and an identifier which identifies where a corresponding data segment is stored within a data storage subsystem; receive a storage access request at a server; access a data structure comprising a plurality of entries, wherein each entry of the plurality of entries identifies a block of the plurality of blocks, and a fingerprint corresponding to a first entry of the block; responsive to said access of the data structure, identify a particular block of the plurality of blocks corresponding to the storage access request; responsive to identifying the particular block, cause the server to send a query corresponding to the particular block to the index; and cause the index to convey a response to the server that indicates whether the given fingerprint is included within the particular block, responsive to the query; wherein the instructions are further executable to sort the plurality of blocks of the index based on a value of an entry in a given block, and wherein responsive to said access of the data structure the particular block identified is one of; a block in the index whose first entry includes a fingerprint value that is a smallest value of all first entry fingerprint values that are greater than the given fingerprint; and a block with a first entry including a fingerprint value that is a largest value of all first entry fingerprint values that are less than the given fingerprint. - View Dependent Claims (16, 17)
-
Specification