Block-level single instancing
First Claim
1. A system for storing a single instance of a block of data in a data storage network, wherein the data storage network includes multiple storage devices coupled via a computer network, and wherein the computer network also couples to one or more computing devices having file systems on which files are stored, the system comprising:
- one or more storage devices storing multiple blocks of data in one or more container files, wherein a file stored on a file system of a computing device is comprised of one or more blocks of data;
one or more single instance databases storing, for at least some of the multiple blocks of data, an identifier of the stored block of data, and a location of the stored block of data in a container file;
one or more index files storing, for at least some of the multiple blocks of data, a single flag indicating whether the stored block of data is referred to in one or more metadata files on the one or more storage devices; and
a secondary storage computing device configured to—
receive data corresponding to one or more data storage jobs from the one or more computing devices,wherein a data storage job is performed on one or more files stored on the file systems of the one or more computing devices, and wherein the received data includes multiple blocks of data; and
for at least some of the multiple blocks of data in the received data—
determine an identifier of the received block of data;
determine if one of the single instance databases already stores the identifier;
when one of the single instance databases already stores the identifier,determine the corresponding block of data in a container file,store a reference to the corresponding block of data in one of the metadata files, andupdating the flag for the corresponding block of data in one of the index files; and
when none of the single instance databases already stores the identifier,store the received block of data in a container file,
wherein a container file includes stored blocks of data from more than one file stored on the one or more computing devices,storing a reference to the received block of data in one of the metadata files, andcreating a new entry for the received block in one of the index files.
4 Assignments
0 Petitions
Accused Products
Abstract
Described in detail herein are systems and methods for single instancing blocks of data in a data storage system. For example, the data storage system may include multiple computing devices (e.g., client computing devices) that store primary data. The data storage system may also include a secondary storage computing device, a single instance database, and one or more storage devices that store copies of the primary data (e.g., secondary copies, tertiary copies, etc.). The secondary storage computing device receives blocks of data from the computing devices and accesses the single instance database to determine whether the blocks of data are unique (meaning that no instances of the blocks of data are stored on the storage devices). If a block of data is unique, the single instance database stores it on a storage device. If not, the secondary storage computing device can avoid storing the block of data on the storage devices.
-
Citations
17 Claims
-
1. A system for storing a single instance of a block of data in a data storage network, wherein the data storage network includes multiple storage devices coupled via a computer network, and wherein the computer network also couples to one or more computing devices having file systems on which files are stored, the system comprising:
-
one or more storage devices storing multiple blocks of data in one or more container files, wherein a file stored on a file system of a computing device is comprised of one or more blocks of data; one or more single instance databases storing, for at least some of the multiple blocks of data, an identifier of the stored block of data, and a location of the stored block of data in a container file; one or more index files storing, for at least some of the multiple blocks of data, a single flag indicating whether the stored block of data is referred to in one or more metadata files on the one or more storage devices; and a secondary storage computing device configured to— receive data corresponding to one or more data storage jobs from the one or more computing devices, wherein a data storage job is performed on one or more files stored on the file systems of the one or more computing devices, and wherein the received data includes multiple blocks of data; and for at least some of the multiple blocks of data in the received data— determine an identifier of the received block of data; determine if one of the single instance databases already stores the identifier; when one of the single instance databases already stores the identifier, determine the corresponding block of data in a container file, store a reference to the corresponding block of data in one of the metadata files, and updating the flag for the corresponding block of data in one of the index files; and when none of the single instance databases already stores the identifier, store the received block of data in a container file,
wherein a container file includes stored blocks of data from more than one file stored on the one or more computing devices,storing a reference to the received block of data in one of the metadata files, and creating a new entry for the received block in one of the index files. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method of single instancing multiple blocks of data, wherein the method is performed by a first computing device having a processor and memory, the method comprising:
-
for at least some of multiple blocks of data included in data received from a set of one or more computing devices distinct from the first computing device, wherein the one or more computing devices have file systems storing files, and wherein each file comprises at least one block of data— determining an identifier of each received block of data; accessing, by the first computing device, one or more data structures that store, for each block of data stored in one or more logical containers on one or more storage devices, an identifier of the stored block of data, a location of the stored block of data in a logical container, and a single indicator of whether the stored block of data is referred to in one or more metadata containers on the one or more storage devices, wherein the logical container includes stored blocks of data from more than one file received from the set of one or more computing devices; determining, based upon the identifier of the received block of data and based upon access to the one or more data structures, if the received block of data should be stored; when the received block of data should not be stored, then determining an already stored instance of the received block of data in a logical container, storing a reference to that instance in one of the metadata containers, and updating the indicator for that instance in the one or more data structures; and when the received block of data should be stored, then storing the received block of data in one of the logical containers, storing a reference to the received block of data in one of the metadata containers, and creating a new entry for the received block in the one or more data structures. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A method for reducing duplication of stored data, wherein the method is performed by a computing system having a processor and memory, the method comprising:
-
receiving an indication to perform a data storage operation on data from one or more files stored at one or more computing devices, wherein the data includes one or more blocks; for at least some of the blocks— determining whether a block is eligible for single instancing; if the block is not eligible for single instancing, then storing the block in a first container file, wherein the first container file stores blocks that are not eligible for single instancing, and wherein the first container file also includes at least one data structure that stores references to blocks that are eligible for single instancing; if the block is eligible for single instancing, then determining if an instance of the block has already been stored on a storage device distinct from the computing device; if an instance of the block has already been stored on the storage device, then storing in the first container file a reference to the already stored instance of the block in the at least one data structure; and if an instance of the block has not already been stored on the storage device, then— storing the block in a second container file, wherein the second container file stores only a single instance of each block, wherein the second container file stores blocks from more than one file stored at one or more computing devices, wherein the second container file includes multiple portions available for storing blocks, and wherein the block is stored in one or more portions; storing in the first container file a reference to the block in the second container file, wherein the reference to the block is stored in the at least one data structure; and storing in the at least one data structure an indication that the one or more portions in the second container file are not available for storing blocks.
-
Specification