ELIMINATING DUPLICATE DATA BY SHARING FILE SYSTEM EXTENTS
First Claim
1. A method performed by a computer system, the method comprising:
- receiving a data set written to an virtual storage device;
partitioning the received data set into a plurality of sections of data;
for each section of data,determining whether the section of data includes duplicate data, wherein duplicate data is data that was previously written to another virtual storage device and stored on a physical storage device at a storage location;
in response to determining that a section of data includes duplicate data, performing deduplication in relation to the duplicate data; and
generating a descriptor for the received data set that identifies the storage location of each section of data previously written to another virtual storage device and stored on the physical storage device, rather than storing the corresponding duplicate data at another storage location on the physical storage device.
1 Assignment
0 Petitions
Accused Products
Abstract
A hardware and/or software facility to enable emulated storage devices to share data stored on physical storage resources of a storage system. The facility may be implemented on a virtual tape library (VTL) system configured to back up data sets that have a high level of redundancy on multiple virtual tapes. The facility organizes all or a portion of the physical storage resources according to a common store data layout. By enabling emulated storage devices to share data stored on physical storage resources, the facility enables deduplication across the emulated storage devices irrespective of the emulated storage device to which the data is or was originally written, thereby eliminating duplicate data on the physical storage resources and improving the storage consumption of the emulated storage devices on the physical storage resources.
35 Citations
20 Claims
-
1. A method performed by a computer system, the method comprising:
-
receiving a data set written to an virtual storage device; partitioning the received data set into a plurality of sections of data; for each section of data, determining whether the section of data includes duplicate data, wherein duplicate data is data that was previously written to another virtual storage device and stored on a physical storage device at a storage location; in response to determining that a section of data includes duplicate data, performing deduplication in relation to the duplicate data; and generating a descriptor for the received data set that identifies the storage location of each section of data previously written to another virtual storage device and stored on the physical storage device, rather than storing the corresponding duplicate data at another storage location on the physical storage device. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A physical storage device comprising:
-
a first data set written to a first emulated storage device, wherein the first data set includes a first section of data; and a second data set written to a second emulated storage device, wherein the second data set includes the first section of data, and wherein the first section of data is written to a storage location on the physical storage device that is shared by the first and second emulated storage devices. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer-readable storage medium encoded with a data structure that is used by a storage system to share data written to different virtual storage devices, the data structure comprising:
-
a plurality of record entries, each record entry corresponding to a data set written to a virtual storage device and identifying a plurality of extents that include the corresponding data set, wherein each extent identifies a storage location at which a portion data is stored on a physical storage device, and wherein at least one extent is identified by two or more record entries in the data structure, each of the two or more record entries corresponding to a data set written to different virtual storage devices; and a plurality of fingerprint entries, each fingerprint entry corresponding to a portion of data stored on the physical storage device and identifying a hash value of the portion of data. - View Dependent Claims (18, 19)
-
-
20. A method comprising:
-
storing an item of data in a storage system that includes a plurality of virtual storage devices in the storage system, according to a data layout that enables the item of data to be shared by a plurality of virtual storage devices in the storage system; and deduplicating at least a portion of the storage system, including deduplicating data across the plurality of virtual storage devices
-
Specification