Data compression and storage techniques
First Claim
1. A method for use in computerized data storage, wherein a computerized system is operative to utilize computer readable media to back-up a data set, comprising:
- generating hash signatures including an identifier hash associated with identifying data and a content hash associated with content of individual portions of an initial data set;
transferring the initial data set to a storage location via a network interface;
at a time subsequent to transferring the initial data set, performing a back-up of a subsequent data set associated with the initial data set, wherein performing the back-up comprises;
generating hash signatures including an identifier hash associated with identifying data and a content hash associated with content of individual portions of the subsequent data;
comparing the identifier hashes of corresponding portions of the initial data set and the subsequent data set and, upon failing to match identifier hashes, comparing content hashes of said corresponding portions to determine if a corresponding content hash exists for the initial data set and to identify changed portions of the subsequent data set;
obtaining corresponding portions of the initial dataset that correspond to the changed portions of the subsequent data set;
preloading a dictionary-based compression engine with one of the corresponding portions of the initial data set, wherein the one corresponding portion of the initial data set is loaded in the dictionary-based compression engine and defines an individual dictionary block;
compressing a corresponding one of the changed portions of the subsequent data set using the dictionary-based compression engine as loaded with the corresponding portion of the initial data set as a dictionary, wherein a compressed data portion is generated; and
storing the compressed data portion to the storage location via the network interface to define a back-up version of the subsequent data set.
1 Assignment
0 Petitions
Accused Products
Abstract
Provided are systems and methods for use in data archiving. In one arrangement, compression techniques are provided wherein an earlier version of a data set (e.g., file folder, etc) is utilized as a dictionary of a compression engine to compress a subsequent version of the data set. This compression identifies changes between data sets and allows for storing these differences without duplicating many common portions of the data sets. For a given version of a data set, new information is stored along with metadata used to reconstruct the version from each individual segment saved at different points in time. In this regard, the earlier data set and one or more references to stored segments of a subsequent data set may be utilized to reconstruct the subsequent data set.
36 Citations
20 Claims
-
1. A method for use in computerized data storage, wherein a computerized system is operative to utilize computer readable media to back-up a data set, comprising:
-
generating hash signatures including an identifier hash associated with identifying data and a content hash associated with content of individual portions of an initial data set; transferring the initial data set to a storage location via a network interface; at a time subsequent to transferring the initial data set, performing a back-up of a subsequent data set associated with the initial data set, wherein performing the back-up comprises; generating hash signatures including an identifier hash associated with identifying data and a content hash associated with content of individual portions of the subsequent data; comparing the identifier hashes of corresponding portions of the initial data set and the subsequent data set and, upon failing to match identifier hashes, comparing content hashes of said corresponding portions to determine if a corresponding content hash exists for the initial data set and to identify changed portions of the subsequent data set; obtaining corresponding portions of the initial dataset that correspond to the changed portions of the subsequent data set; preloading a dictionary-based compression engine with one of the corresponding portions of the initial data set, wherein the one corresponding portion of the initial data set is loaded in the dictionary-based compression engine and defines an individual dictionary block; compressing a corresponding one of the changed portions of the subsequent data set using the dictionary-based compression engine as loaded with the corresponding portion of the initial data set as a dictionary, wherein a compressed data portion is generated; and storing the compressed data portion to the storage location via the network interface to define a back-up version of the subsequent data set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method for use in computerized data storage, wherein a computerized system is operative to utilize computer readable media to back-up a data set, comprising:
-
delineating an initial data set into a first set of data portions having a predetermined size; generating a hash signature including an identifier hash and a content hash associated with each data portion of the initial data set; storing the data portions of the initial data set; at a time subsequent to storing the initial data set, performing a back-up of a subsequent data set associated with the initial data set, wherein performing the back-up comprises; delineating the subsequent data set into a second set of data portions having the same predetermined size as the data portions of the first data set; generating a hash signature including an identifier hash and a content hash associated with each data portion of the subsequent data set; comparing identifier hashes of the initial data set and the subsequent data set and, upon failing to match identifier hashes, comparing content hashes to determine if a corresponding content hash exists for the initial data set and to identify data portions of the subsequent data set that are different from corresponding data portions of the first data set; preloading a dictionary-based compression engine with one of the corresponding data portions of the initial data set; compressing a corresponding one of the changed data portions of the subsequent data set using the dictionary-based compression engine as loaded with the one corresponding portion of the initial data set as a dictionary, wherein a compressed data portion is generated; and storing the compressed data portion to at least partially define a back-up version of the subsequent data set. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification