DATA DEDUPLICATION FOR STREAMING SEQUENTIAL DATA STORAGE APPLICATIONS
First Claim
1. A method for data deduplication compression in a streaming storage application, comprising compressing fully sequential data stored in a data repository to a sequential streaming storage, by:
- splitting fully sequential data into data blocks;
hashing content of each data block and comparing each hash to an in-memory lookup table for a match, the in-memory lookup table storing all hashes that have been encountered during the compression of the fully sequential data;
for each data block without a hash match, adding the data block as a new data block for compression of fully sequential data; and
encoding duplicate data blocks using the in-memory lookup table into data segments.
1 Assignment
0 Petitions
Accused Products
Abstract
Data deduplication compression in a streaming storage application, is provided. The disclosed deduplication process provides a deduplication archive that enables storage of the archive to, and extraction from, a streaming storage medium. One implementation involves compressing fully sequential data stored in a data repository to a sequential streaming storage, by: splitting fully sequential data into data blocks; hashing content of each data block and comparing each hash to an in-memory lookup table for a match, the in-memory lookup table storing all hashes that have been encountered during the compression of the fully sequential data; for each data block without a hash match, adding the data block as a new data block for compression of fully sequential data; and encoding duplicate data blocks using the in-memory lookup table into data segments.
-
Citations
20 Claims
-
1. A method for data deduplication compression in a streaming storage application, comprising compressing fully sequential data stored in a data repository to a sequential streaming storage, by:
-
splitting fully sequential data into data blocks; hashing content of each data block and comparing each hash to an in-memory lookup table for a match, the in-memory lookup table storing all hashes that have been encountered during the compression of the fully sequential data; for each data block without a hash match, adding the data block as a new data block for compression of fully sequential data; and encoding duplicate data blocks using the in-memory lookup table into data segments. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer program product for data deduplication compression in a streaming storage application, the computer program product comprising:
-
a computer readable storage medium having computer readable program code embodied therewith, wherein the computer readable program when executed on the computer causes the computer to provide a deduplication archive that enables storage of the archive to, and extraction from, a streaming storage medium by; compressing fully sequential data stored in a data repository to a sequential streaming storage, by; splitting fully sequential data into data blocks; hashing content of each data block and comparing each hash to an in-memory lookup table for a match, the in-memory lookup table storing all hashes that have been encountered during the compression of the fully sequential data; for each data block without a hash match, adding the data block as a new data block for compression of fully sequential data; and encoding duplicate data blocks using the in-memory lookup table into data segments. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A data deduplication compression system for a streaming storage application, comprising a deduplication module configured for compressing fully sequential data stored in a data repository to a sequential streaming storage, the deduplication module comprising a deduplication compression module configured for:
-
splitting fully sequential data into data blocks; hashing content of each data block and comparing each hash to an in-memory lookup table for a match, the in-memory lookup table storing all hashes that have been encountered during the compression of the fully sequential data; for each data block without a hash match, adding the data block as a new data block for compression of fully sequential data; and encoding duplicate data blocks using the in-memory lookup table into data segments. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification