BLOCK-LEVEL SINGLE INSTANCING
First Claim
1. A computing system for reclaiming storage space on one or more storage devices having native file systems, wherein the storage space is utilized by one or more logical containers to store deduplicated blocks of data, wherein locations of the deduplicated blocks of data in the logical containers are not tracked by the native file systems of the storage devices, the computing system comprising:
- one or more storage devices storing on physical media—
one or more logical containers that include multiple deduplicated blocks of data that correspond to data objects; and
one or more data structures that indicate whether the blocks of data are referred to;
one or more databases storing information indicating whether the blocks of data are referred to; and
a secondary storage computing device programmed to—
receive an indication to remove a first set of blocks of data from a first logical container;
for each of the blocks of data in the first set—
determine, from the databases, whether the block of data is referred to; and
if the block of data is not referred to, update the data structures to indicate that the block of data is not referred to;
determine from the data structures that a threshold number of contiguous blocks of data in the first logical container that are not referred to has been reached; and
make available for storage portions of the one or more physical media corresponding to the contiguous blocks of data in the first logical container,wherein the data structures and the databases are not part of the native file systems of the storage devices
4 Assignments
0 Petitions
Accused Products
Abstract
Described in detail herein are systems and methods for single instancing blocks of data in a data storage system. For example, the data storage system may include multiple computing devices (e.g., client computing devices) that store primary data. The data storage system may also include a secondary storage computing device, a single instance database, and one or more storage devices that store copies of the primary data (e.g., secondary copies, tertiary copies, etc.). The secondary storage computing device receives blocks of data from the computing devices and accesses the single instance database to determine whether the blocks of data are unique (meaning that no instances of the blocks of data are stored on the storage devices). If a block of data is unique, the single instance database stores it on a storage device. If not, the secondary storage computing device can avoid storing the block of data on the storage devices.
353 Citations
36 Claims
-
1. A computing system for reclaiming storage space on one or more storage devices having native file systems, wherein the storage space is utilized by one or more logical containers to store deduplicated blocks of data, wherein locations of the deduplicated blocks of data in the logical containers are not tracked by the native file systems of the storage devices, the computing system comprising:
-
one or more storage devices storing on physical media— one or more logical containers that include multiple deduplicated blocks of data that correspond to data objects; and one or more data structures that indicate whether the blocks of data are referred to; one or more databases storing information indicating whether the blocks of data are referred to; and a secondary storage computing device programmed to— receive an indication to remove a first set of blocks of data from a first logical container; for each of the blocks of data in the first set— determine, from the databases, whether the block of data is referred to; and if the block of data is not referred to, update the data structures to indicate that the block of data is not referred to; determine from the data structures that a threshold number of contiguous blocks of data in the first logical container that are not referred to has been reached; and make available for storage portions of the one or more physical media corresponding to the contiguous blocks of data in the first logical container, wherein the data structures and the databases are not part of the native file systems of the storage devices - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method of reclaiming storage space on one or more storage devices, wherein the storage space is utilized by one or more logical containers to store deduplicated blocks of data, and wherein the method is performed by a computing system having a processor and memory, the method comprising:
-
receiving an indication to remove a first data object, wherein the first data object is stored as multiple first blocks of data in at least a first logical container; accessing, by the computing system, a first data structure that indicates whether the first data object is referred to; determining that the first data object is not referred to; determining from a second data structure that a first number of multiple contiguous second blocks of data in the first logical container that are not referred to has been reached; and after determining that the first number has been reached, specifying as available for storage a portion of the first logical container corresponding to the multiple contiguous second blocks of data. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computing system for reclaiming storage space on one or more means for storing, wherein the storage space is utilized by one or more logical containers to store deduplicated blocks of data, the computing system comprising:
-
means for storing— one or more logical containers that include multiple deduplicated blocks of data that correspond to data objects; one or more first data structures that indicate whether the blocks of data are referred to by other data objects; and one or more second data structures that indicate whether the blocks of data are referred to by other data objects; means for receiving an indication to remove a first set of blocks of data from a first logical container; means for determining whether a block of data is referred to, for each of the blocks of data in the first set; means for updating the second data structures to indicate that a block of data is not referred to, for each of the blocks of data in the first set that is not referred to; means for determining from the second data structures that a first number of contiguous blocks of data in the first logical container that are not referred to has been reached; and means for specifying as available for storage portions of the one or more physical media corresponding to the contiguous blocks of data in the first logical container. - View Dependent Claims (18, 19, 20, 21, 22)
-
-
23. A system for storing a single instance of a block of data in a data storage network, wherein the data storage network includes multiple storage devices coupled via a computer network, and wherein the computer network also couples to one or more computing devices having file systems on which files are stored, the system comprising:
-
one or more storage devices storing multiple blocks of data in one or more container files, wherein a file stored on a file system of a computing device is comprised in one or more blocks of data; one or more single instance databases storing, for at least some of the multiple blocks of data, an identifier of the block of data and a location of the block of data in a container file; and a secondary storage computing device configured to— receive data corresponding to one or more data storage jobs from the one or more computing devices, wherein a data storage job is performed on one or more files stored on the file systems of the one or more computing devices, and wherein the received data includes— multiple blocks of data indicated to be eligible for single instancing; and multiple items of data not indicated to be eligible for single instancing; for at least some of the multiple blocks of data indicated to be eligible for single instancing— determine an identifier of the block of data; determine if the single instance database already stores the identifier; when the single instance database already stores the identifier, determine a location of the corresponding block of data in a container file and store a reference to the location; and when the single instance database does not already store the identifier, store the block of data in a container file; for at least some of the multiple items of data not indicated to be eligible for single instancing, store the items of data in a metadata file; and maintain an index file that indicates whether the multiple blocks of data in the one or more container files are referred to. - View Dependent Claims (24, 25)
-
-
26-32. -32. (canceled)
-
33. A method of single instancing multiple blocks of data, wherein the method is performed by a first computing device having a processor and memory, the method comprising:
-
for at least some of multiple blocks of data indicated to be eligible for single instancing included in data received from a set of one or more computing devices distinct from the first computing device, wherein the one or more computing devices have file systems storing files that comprise one or more blocks of data— determining an identifier of the block of data; accessing, by the first computing device, a data structure that stores, for a set of blocks of data stored in one or more logical containers on one or more storage devices, an identifier of the block of data and a location of the block of data in a logical container; determining, based upon the identifier of the block of data and access to the data structure, if the block of data should be stored; when the block of data should not be stored, determining a location of an already stored instance of the block of data in a logical container and storing a reference to the location; and when the block of data should be stored, storing the block of data in a logical container; and maintaining a block reference data structure that indicates whether blocks of data in the one or more logical containers are referred to. - View Dependent Claims (34, 35)
-
-
36-47. -47. (canceled)
Specification