Pre-cache similarity-based delta compression for use in a data storage system
First Claim
1. A method comprising:
- processing a data block in a data storage system with a processor, producing a plurality of signatures, wherein processing the data block comprises generating a signature for every two or more consecutive bytes of the data block;
determining similarity of the data block to at least one reference data block using at least a portion of the plurality of signatures; and
generating, with the processor, cache data that represents differences between the data block and the at least one reference data block;
wherein generating the signature for every two or more consecutive bytes of the data block comprises generating the signature for every three consecutive bytes of the data block, wherein the data block comprises a four kilo-byte (4 KB) block, and wherein a number of signatures from the data block is the plurality of signatures is 4K−
2.
7 Assignments
0 Petitions
Accused Products
Abstract
A data storage caching architecture supports using native local memory such as host-based RAM, and if available, Solid State Disk (SSD) memory for storing pre-cache delta-compression based delta, reference, and independent data by exploiting content locality, temporal locality, and spatial locality of data accesses to primary (e.g. disk-based) storage. The architecture makes excellent use of the physical properties of the different types of memory available (fast r/w RAM, low cost fast read SSD, etc) by applying algorithms to determine what types of data to store in each type of memory. Algorithms include similarity detection, delta compression, least popularly used cache management, conservative insertion and promotion cache replacement, and the like.
26 Citations
20 Claims
-
1. A method comprising:
- processing a data block in a data storage system with a processor, producing a plurality of signatures, wherein processing the data block comprises generating a signature for every two or more consecutive bytes of the data block;
determining similarity of the data block to at least one reference data block using at least a portion of the plurality of signatures; and
generating, with the processor, cache data that represents differences between the data block and the at least one reference data block;
wherein generating the signature for every two or more consecutive bytes of the data block comprises generating the signature for every three consecutive bytes of the data block, wherein the data block comprises a four kilo-byte (4 KB) block, and wherein a number of signatures from the data block is the plurality of signatures is 4K−
2. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 20)
- processing a data block in a data storage system with a processor, producing a plurality of signatures, wherein processing the data block comprises generating a signature for every two or more consecutive bytes of the data block;
-
9. A method comprising:
- processing a plurality of data blocks with a processor to produce a plurality of signatures that facilitate determining similarity of the plurality of data blocks in a data storage system;
calculating with the processor a signature heat map of a portion of the plurality of signatures to facilitate determining a reference block for similarity-based delta compression of pre-cache data; and
generating, with the processor, cache data that represents differences between a portion of the plurality of data blocks and the determined reference block;
wherein generating the signature for every two or more consecutive bytes of the data block comprises generating the signature for every three consecutive bytes of the data block, wherein the data block comprises a four kilo-byte (4 KB) block, and wherein a number of signatures from the data block is the plurality of signatures is 4K−
2. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- processing a plurality of data blocks with a processor to produce a plurality of signatures that facilitate determining similarity of the plurality of data blocks in a data storage system;
Specification