Systems and methods for creating copies of data, such as archive copies
First Claim
Patent Images
1. A system for rebuilding at least a portion of a signature database that reflects contents of an archive copy of a data set, comprising:
- a signature component, wherein the signature component generates a substantially unique identifier for all data objects within the data set and stores the substantially unique identifiers in a signature database, wherein the substantially unique identifier for a data object reflects contents of the data object;
an encryption component, wherein the encryption component encrypts at least some of the data objects of the data set;
a copy component, wherein the copy component;
uses the generated substantially unique identifiers to identify redundant data objects in the data set and deduplicate the redundant data objects in order to create a deduplicated archive copy of the data set that comprises the encrypted data objects;
wherein the archive copy is physically stored on sequential media; and
stores the archive copy as one or more data chunks stored on the sequential media,wherein each chunk is stored with header information that includes at least one substantially unique identifier; and
stores information related to locations of the encrypted data objects on the sequential media in a location database separate from the signature database; and
a database rebuilding component, wherein the database rebuilding component;
receives an indication that the signature database is unrecoverable or unavailable;
accesses header information of at least one chunk in order to determine at least one substantially unique identifier within the header information; and
uses the determined at least one substantially unique identifier from the header information in order to rebuild at least part of the signature database.
4 Assignments
0 Petitions
Accused Products
Abstract
A system and method of creating archive copies of data sets is described. In some examples, the system creates an archive copy from an original data set. In some examples, the system creates an archive copy when creating a recovery copy for a data set. In some examples, the system creates a copy without redundant data, and then encrypts the data set.
600 Citations
13 Claims
-
1. A system for rebuilding at least a portion of a signature database that reflects contents of an archive copy of a data set, comprising:
-
a signature component, wherein the signature component generates a substantially unique identifier for all data objects within the data set and stores the substantially unique identifiers in a signature database, wherein the substantially unique identifier for a data object reflects contents of the data object; an encryption component, wherein the encryption component encrypts at least some of the data objects of the data set; a copy component, wherein the copy component; uses the generated substantially unique identifiers to identify redundant data objects in the data set and deduplicate the redundant data objects in order to create a deduplicated archive copy of the data set that comprises the encrypted data objects; wherein the archive copy is physically stored on sequential media; and stores the archive copy as one or more data chunks stored on the sequential media, wherein each chunk is stored with header information that includes at least one substantially unique identifier; and stores information related to locations of the encrypted data objects on the sequential media in a location database separate from the signature database; and a database rebuilding component, wherein the database rebuilding component; receives an indication that the signature database is unrecoverable or unavailable; accesses header information of at least one chunk in order to determine at least one substantially unique identifier within the header information; and uses the determined at least one substantially unique identifier from the header information in order to rebuild at least part of the signature database. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A non-transitory computer-readable medium whose contents cause a data storage system to perform a method of rebuilding a deduplication index that reflects contents of an archive of data objects, the method comprising:
-
identifying a data object to be stored in an archive of data objects that form a data set; creating a hash value for the data object, wherein creating the hash value includes calculating a hash value that represents contents of the data object; deduplicating the data set by; comparing the hash value with other hash values for data objects already stored in the archive of data objects; when the comparison determines that the hash value for the data object is different than the other hash values; encrypting a copy of the data object, and transferring the encrypted copy of the data object and the hash value to the archive of data objects, and storing in a file on sequential media, the transferred encrypted copy of the data object and the transferred hash value, wherein a header region of the file stores the hash value;
orwhen the comparison determines that the hash value for the data object is identical to one or more of the other hash values; transferring the hash value that represents contents of the data object to the archive of data objects; and storing in a file on sequential media, the transferred hash value, wherein a header region of the file stores the hash value; updating an entry in a deduplication index to reflect the identification of the data object, wherein the entry is updated using the hash value; upon receiving an indication that the deduplication index is unavailable or unrecoverable, accessing the hash value from the header region of a data file stored on sequential media; and using the accessed hash value to rebuild a portion of a new, rebuilt version of the deduplication index. - View Dependent Claims (8, 9, 10)
-
-
11. A method for rebuilding at least a portion of a single instancing index containing hash values that represent contents of a single instanced data set, comprising:
-
single instancing a data set in order to create a single instanced data set organized as an archive file and physically stored on one or more magnetic tapes, wherein the single instancing further comprises; calculating substantially unique hash values that represent the data set, storing at least some of the calculated hash values that represent the data set in a single instancing index, and storing the calculated hash values within headers of one or more data files that form part of the archive file, wherein the one or more data files are separate from the single instancing index and also store at least a subset of the data set, and wherein the one or more data files are stored on the one or more tapes; receiving an indication that at least part of the single instancing index storing hash values that represent the data set is unrecoverable or unavailable; in response to receiving the indication, identifying at least one data file that forms part of the archive file on the one or more tapes; extracting stored hash value information from a header of the identified at least one data file that forms part of the archive file; and
,adding the extracted hash value information to a new, rebuilt version of the single instancing index. - View Dependent Claims (12, 13)
-
Specification