Archive and backup virtualization
First Claim
1. A method for providing data protection in a computer architecture including a client computer system and secondary storage, the method comprising:
- integrating a data backup process together with a data archive process, where the data backup process is different from the data archive process, and the integration of the data backup process with the data archive process is performed by;
creating a de-duplicated backup data set from a raw data set that includes both active and inactive portions stored in primary storage of a client computer system, wherein the de-duplicated backup data set includes part of the active portion of the raw data set;
storing the de-duplicated backup data set in a secondary storage remote from the client computer system;
creating a de-duplicated archive data set from at least the inactive portion of the raw data set stored in the primary storage, wherein the archive data set re-uses one or more pieces of raw data currently stored in the secondary storage by referencing the one or more pieces of raw data in the de-duplicated backup data set, the one or more pieces of raw data having been stored in the secondary storage during one or more previous backups; and
storing the de-duplicated archive data set in the secondary storage such that at least some of the raw data included in the de-duplicated archive data set is the same raw data included in the de-duplicated backup data set.
9 Assignments
0 Petitions
Accused Products
Abstract
A data storage and protection system includes secondary storage and at least one instance of a high efficiency storage application (“HESA”). The HESA backs up and archives client data stored in primary storage of a client computer system or client node to secondary storage. Archive files generated by the HESA re-use previously backed up client data stored in the secondary storage. In one embodiment, previously backed up client data is re-used for an archive file by organizing the archive file as a hash tree having hash values pointing to the previously backed up client data. In addition, the HESA can maximize available space in the primary storage by replacing previously backed up and/or archived client data in the primary storage with pointers that point to the previously backed up and/or archived client data in secondary storage.
-
Citations
20 Claims
-
1. A method for providing data protection in a computer architecture including a client computer system and secondary storage, the method comprising:
integrating a data backup process together with a data archive process, where the data backup process is different from the data archive process, and the integration of the data backup process with the data archive process is performed by; creating a de-duplicated backup data set from a raw data set that includes both active and inactive portions stored in primary storage of a client computer system, wherein the de-duplicated backup data set includes part of the active portion of the raw data set; storing the de-duplicated backup data set in a secondary storage remote from the client computer system; creating a de-duplicated archive data set from at least the inactive portion of the raw data set stored in the primary storage, wherein the archive data set re-uses one or more pieces of raw data currently stored in the secondary storage by referencing the one or more pieces of raw data in the de-duplicated backup data set, the one or more pieces of raw data having been stored in the secondary storage during one or more previous backups; and storing the de-duplicated archive data set in the secondary storage such that at least some of the raw data included in the de-duplicated archive data set is the same raw data included in the de-duplicated backup data set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
9. A method for reducing redundant data in a data storage system by re-using backup data for archiving, the method comprising:
integrating a data backup process together with a data archive process, where the data backup process is different from the data archive process, and the integration of the data backup process with the data archive process is performed by; storing client data that includes both active and inactive portions in primary storage of a client system, the client data including data previously backed up and stored in secondary storage and data not previously backed up or stored in secondary storage; generating a backup file including active data from the active portion of the client data, wherein the backup file includes the not previously backed up or stored client data and a hash tree organization of the client data from which the backup file is generated, the backup file hash tree organization including one or more hash values pointing to one or more corresponding pieces of client data; storing the backup file in secondary storage; generating an archive file from at least the inactive portion of the client data stored in primary storage, the archive file including a hash tree organization of the at least the first portion of the client data, wherein the archive file hash tree organization includes one or more hash values pointing to one or more corresponding pieces of client data previously backed up and stored in secondary storage, wherein at least some of the one or more hash values included in the archive file hash tree organization are included in the backup file hash tree organization; storing the archive file in secondary storage such that at least some of the raw data included in the archive file is the same raw data included in the backup file; and removing the first portion of the client data that is inactive from the primary storage. - View Dependent Claims (10, 11, 12, 13, 14)
-
15. A data storage and protection system, comprising:
-
a computer system that includes a secondary storage, the computer system including a high efficiency storage application configured to integrate a data backup process together with a data archive process, where the data backup process is different from the data archive process, wherein in operation, the high efficiency storage application operates to; create a de-duplicated backup data set from a raw data set that includes both active and inactive portions stored in primary storage of a client computer system; store the de-duplicated backup data set in a secondary storage remote from the client computer system; create a de-duplicated archive data set from at least the inactive portion of the raw data set stored in the primary storage, wherein the de-duplicated archive data set re-uses one or more pieces of raw data currently stored in the secondary storage by referencing the one or more pieces of raw data in the de-duplicated backup data set, the one or more pieces of raw data having been stored in the secondary storage during one or more previous backups; and store the de-duplicated archive data set in the secondary storage, wherein the de-duplicated backup data set and the de-duplicated archive data set are different data sets and are configured to reference some of the same raw data such that at least some of the raw data included in the de-duplicated archive data set is the same raw data included in the de-duplicated backup data set. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification