REDUCING DATA DUPLICATION IN CLOUD STORAGE
First Claim
1. A method for reducing data duplication in cloud storage, the method comprising:
- receiving at least one first snapshot of one or more remote volumes via a network, the at least one first snapshot including at least one copy of the one or more remote volumes at a first instant in time, individual ones of the one or more remote volumes including a plurality of clusters, individual ones of the plurality of clusters being identified as valid or invalid, valid clusters containing data to be backed up, and invalid clusters being devoid of data to be backed up;
identifying, responsive to and based on the at least one first snapshot, unique clusters and duplicate clusters among the valid clusters, the duplicate clusters being valid clusters in the one or remote volumes containing identical data;
storing, in a backup file, the unique clusters and single instances of the duplicate clusters such that the backup file is devoid of duplicate clusters;
receiving at least one second snapshot of the one or more remote volumes via the network, the at least one second snapshot including at least one copy of the one or more remote volumes at a second instant in time, the second instant in time being after the first instant in time;
identifying, responsive to and based on the at least one second snapshot, valid clusters in the one or more remote volumes not yet stored in the backup file and clusters in the backup file that are no longer valid; and
utilizing, responsive to the at least one second snapshot, the clusters in the backup file that are no longer valid to store the valid clusters in the one or more remote volumes not yet stored in the backup file.
2 Assignments
0 Petitions
Accused Products
Abstract
Data duplication may be reduced in cloud storage. First snapshots of one or more remote volumes may be received via a network. The first snapshots may be copies of the one or more remote volumes at a first instant in time. Responsive to and/or based on the first snapshots, unique clusters and duplicate clusters may be identified among the valid clusters of the remote volumes. The unique clusters and single instances of the duplicate clusters may be stored in a backup file, such that the backup file is devoid of duplicate clusters. Second snapshots of the one or more remote volumes may be received via the network. The second snapshots may be copies of the one or more remote volumes at a second instant in time, wherein the second instant in time is after the first instant in time. Responsive to the second snapshots, the clusters in the backup file that are no longer valid may be utilized to store the valid clusters in the one or more remote volumes not yet stored in the backup file.
40 Citations
22 Claims
-
1. A method for reducing data duplication in cloud storage, the method comprising:
-
receiving at least one first snapshot of one or more remote volumes via a network, the at least one first snapshot including at least one copy of the one or more remote volumes at a first instant in time, individual ones of the one or more remote volumes including a plurality of clusters, individual ones of the plurality of clusters being identified as valid or invalid, valid clusters containing data to be backed up, and invalid clusters being devoid of data to be backed up; identifying, responsive to and based on the at least one first snapshot, unique clusters and duplicate clusters among the valid clusters, the duplicate clusters being valid clusters in the one or remote volumes containing identical data; storing, in a backup file, the unique clusters and single instances of the duplicate clusters such that the backup file is devoid of duplicate clusters; receiving at least one second snapshot of the one or more remote volumes via the network, the at least one second snapshot including at least one copy of the one or more remote volumes at a second instant in time, the second instant in time being after the first instant in time; identifying, responsive to and based on the at least one second snapshot, valid clusters in the one or more remote volumes not yet stored in the backup file and clusters in the backup file that are no longer valid; and utilizing, responsive to the at least one second snapshot, the clusters in the backup file that are no longer valid to store the valid clusters in the one or more remote volumes not yet stored in the backup file. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system for reducing data duplication in cloud storage, the system comprising:
one or more processors configured to execute computer program modules, the computer program modules comprising; a snapshot retrieval module configured to receive at least one first snapshot of one or more remote volumes via a network, the at least one first snapshot including at least one copy of the one or more remote volumes at a first instant in time, individual ones of the one or more remote volumes including a plurality of clusters, individual ones of the plurality of clusters being identified as valid or invalid, valid clusters containing data to be backed up, and invalid clusters being devoid of data to be backed up; a cluster identification module configured to identify, responsive to and based on the at least one first snapshot, unique clusters and duplicate clusters among the valid clusters, the duplicate clusters being valid clusters in the one or remote volumes containing identical data; and a backup module configured to store, in a backup file, the unique clusters and single instances of the duplicate clusters such that the backup file is devoid of duplicate clusters; wherein the snapshot retrieval module is further configured to receive at least one second snapshot of the one or more remote volumes via the network, the at least one second snapshot including at least one copy of the one or more remote volumes at a second instant in time, the second instant in time being after the first instant in time; wherein the cluster identification module is further configured to identify, responsive to and based on the at least one second snapshot, valid clusters in the one or more remote volumes not yet stored in the backup file and clusters in the backup file that are no longer valid; and wherein the backup module is further configured to utilize, responsive to the at least one second snapshot, the clusters in the backup file that are no longer valid to store the valid clusters in the one or more remote volumes not yet stored in the backup file. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
Specification