Backup using a client-side signature repository in a networked storage system
First Claim
Patent Images
1. A method of generating a backup data set for a client computing device by using a signature repository residing in a primary storage subsystem, the method comprising:
- for each respective client computing device of one or more client computing devices in a primary storage subsystem;
monitoring storage of a plurality of files formed of data blocks generated by one or more software applications running on the respective client computing device,wherein the plurality of files are stored as primary data in a primary data store associated with the respective client computing device;
maintaining, by a repository agent executing on one or more processors in the primary storage subsystem, a repository indicating at least which data blocks of the monitored files are stored in the primary storage subsystem,wherein at least a first data block of the data blocks stored in a first primary data primary subsystem forms at least a portion of a first file stored in the first primary data store and a second data block that matches the first data block is stored in a second primary data store of the primary storage subsystem and forms at least a portion of a second file stored in the second primary data store,wherein the first file is generated b at least one software application running on a first client computing device associated with the first primary data store and the second file is generated by at least one software application running on a second client computing device associated with the second primary data store, andwherein the first data block and the second data block are stored in a native format of the at least one software application that generated the first data block and the second data block, respectively; and
in response to instructions to create a secondary copy of the first file in a secondary storage subsystem,identifying a set of data blocks that do not have corresponding secondary copy data blocks stored in secondary storage subsystem, the set of data blocks forming at least a portion of the first file and including the first data block;
querying the repository to identify at least a first group of data blocks of the set of data blocks for which matching data blocks are stored in the primary storage subsystem, wherein the first group of data blocks includes the first data block;
identifying a location of the matching data blocks within the primary storage subsystem, wherein the matching data blocks include at least the second data block; and
retrieving the matching data blocks from one or more of the data stores associated with the one or more client computing devices, including the second data block stored in the second primary data store.
4 Assignments
0 Petitions
Accused Products
Abstract
A storage system according to certain embodiments includes a client-side signature repository that includes information representative of a set of data blocks stored in primary storage. During copy or backup operations, the system can use the client-side signature repository to identify data blocks located in primary storage that are new or that have changed. The system can also use the client-side signature repository to identify multiple locations within primary storage where different instances of the data blocks are located.
-
Citations
20 Claims
-
1. A method of generating a backup data set for a client computing device by using a signature repository residing in a primary storage subsystem, the method comprising:
-
for each respective client computing device of one or more client computing devices in a primary storage subsystem; monitoring storage of a plurality of files formed of data blocks generated by one or more software applications running on the respective client computing device, wherein the plurality of files are stored as primary data in a primary data store associated with the respective client computing device; maintaining, by a repository agent executing on one or more processors in the primary storage subsystem, a repository indicating at least which data blocks of the monitored files are stored in the primary storage subsystem, wherein at least a first data block of the data blocks stored in a first primary data primary subsystem forms at least a portion of a first file stored in the first primary data store and a second data block that matches the first data block is stored in a second primary data store of the primary storage subsystem and forms at least a portion of a second file stored in the second primary data store, wherein the first file is generated b at least one software application running on a first client computing device associated with the first primary data store and the second file is generated by at least one software application running on a second client computing device associated with the second primary data store, and wherein the first data block and the second data block are stored in a native format of the at least one software application that generated the first data block and the second data block, respectively; and in response to instructions to create a secondary copy of the first file in a secondary storage subsystem, identifying a set of data blocks that do not have corresponding secondary copy data blocks stored in secondary storage subsystem, the set of data blocks forming at least a portion of the first file and including the first data block; querying the repository to identify at least a first group of data blocks of the set of data blocks for which matching data blocks are stored in the primary storage subsystem, wherein the first group of data blocks includes the first data block; identifying a location of the matching data blocks within the primary storage subsystem, wherein the matching data blocks include at least the second data block; and retrieving the matching data blocks from one or more of the data stores associated with the one or more client computing devices, including the second data block stored in the second primary data store. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A storage system for generating a backup data set for a client computing device using a signature repository, the system comprising:
-
a repository indicating which data blocks are stored in a primary storage subsystem, the primary storage subsystem including one or more client computing devices each having an associated data store, wherein the data blocks are generated by one or more software applications running on the one or more client computing devices, wherein at least a first data block of the data blocks stored in a first primary data store of the primary storage subs stem forms at least a portion of a first file stored in the first primary data store and a second data block that matches the first data block is stored in a second primary data store of the primary storage subsystem and forms at least a portion of a second file stored in the second primary data store, wherein the first file is generated b at least one software application running on a first client computing device associated with the first primary data store and the second file is generated by at least one software application running on a second client computing device associated with the second primary data store, and wherein the first data block and the second data block are stored in a native format of the at least one software application that generated the first data block and the second data block, respectively; and a repository agent executing on one or more processors and configured to; maintain the repository; and in response to instructions to create a secondary copy in a secondary storage subsystem of the first file, identify a set of data blocks that do not have corresponding secondary copy data blocks stored in the secondary storage subsystem, the set of data blocks forming at least a portion of the first file and including the first data block; query the repository to identify at least a first group of data blocks of the set of data blocks for which matching data blocks are stored in the primary storage subsystem; and provide retrieval information usable to locate the matching data blocks within the primary storage subsystem, wherein the matching data blocks include at least the second data block. - View Dependent Claims (15, 16, 17, 18)
-
-
19. A computer-readable, non-transitory storage medium having one or more computer-executable modules for generating a backup data set for a client computing device, the one or more computer-executable modules comprising:
a first module in communication with one or more client computing devices and configured to; maintain a repository indicating data blocks that are stored in a primary storage subsystem, the primary storage subsystem comprising one or more client computing devices each having an associated data store, wherein the data blocks are generated by one or more software applications running on the one or more client computing devices; wherein at least a first data block of the data blocks stored in a first primary data store of the primary storage subs stem forms at least a portion of a first file stored in the first primary data store and a second data block that matches the first data block is stored in a second primary data store of the primary storage subsystem and forms at least a portion of a second file stored in the second primary data store, wherein the first file is generated b at least one software application running on a first client computing device associated with the first primary data store and the second file is generated by at least one software application running on a second client computing device associated with the second primary data store; and in response to instructions to create a secondary copy in a secondary storage subsystem of the first file, identify a set of data blocks that do not have corresponding secondary copy data blocks stored in the secondary storage subs stem, the set of data blocks forming at least a portion of the first file and including the first data block; query the repository to identify at least a first group of data blocks of the set of data blocks for which matching data blocks are not stored in the primary storage subsystem; and provide retrieval information usable to locate the matching data blocks within the primary storage subsystem, wherein the matching data blocks include at least the second data block. - View Dependent Claims (20)
Specification