Client-side repository in a networked deduplicated storage system
First Claim
Patent Images
1. A method for restoring data to a client system from secondary storage, the method comprising:
- providing one or more computer processors;
performing with a media agent executing in the one or more computer processors, a secondary copy operation that copies a plurality of data blocks associated with primary storage in a client system to secondary storage located remotely from the primary storage, wherein the client system communicates with the secondary storage via a wide area network and wherein the secondary copy operation creates a secondary copy of the plurality of data blocks in the secondary storage;
during performance of the secondary copy operation, creating with the media agent, for each data block of the plurality of data blocks a hash signature for each data block according to a deduplication scheme;
during performance of the secondary copy operation, further copying at least a portion of the data blocks and a first copy of hash signatures associated with the data blocks to a client-side repository comprising at least computer memory, wherein client-side repository is different than the secondary storage, and wherein the client system communicates with the client-side repository over a local area network;
during performance of the secondary copy operation, populating an index in communication with the media agent with a second copy of the hash signatures associated with the plurality of the data blocks stored in the secondary storage;
further creating with the media agent, age information associated with the time of the creation of the secondary copy of the plurality of data blocks in the secondary storage;
storing in at least computer memory, the age information about the time of creation of the secondary copy of the plurality of data blocks in secondary storage;
receiving at the media agent, a request to restore data to the client system;
consulting with the media agent, the age information to determine the time of the creation of the secondary copy in secondary storage of at least one data block associated with the restore data;
based on the age information of the time of the creation of the secondary copy in secondary storage of the at least one data block associated with the restore data, deciding with the media agent whether to query the client-side repository remote from the secondary storage as to whether the client-side repository is populated with a copy of the at least one data block associated with the restore data;
in response to determining that the age of creation of the secondary copy in secondary storage of the at least one data block associated with the restore data satisfies a threshold age,querying the client-side repository with the second copy of the hash signature from the index to determine whether the first copy of the hash signature is stored in the client-side repository;
receiving an answer from the client-side repository indicating a result of the query; and
in response to the answer, accessing the at least one data block associated with the restore data from secondary storage for transmission to the client system when the result indicates that the client-side repository is not populated with the first hash signature, wherein the at least one data block associated with the restore data is restored from the client-side repository to an information store of the client system via the local area network when the client-side repository is populated with the first hash signature; and
in response to determining that the age of creation of the secondary copy in secondary storage of the at least one data block associated with the restore data does not satisfy the threshold age, restoring the at least one data block from secondary storage for transmission to the client system via the wide area network.
4 Assignments
0 Petitions
Accused Products
Abstract
A storage system according to certain embodiments includes a client-side repository (CSR). The CSR may communicate with a client at a higher data transfer rate than the rate used for communication between the client and secondary storage. During copy operations, for instance, some or all of the data being backed up or otherwise copied to secondary storage is stored in the CSR. During restore operations, copies of the data stored in the CSR is accessed from the CSR instead of from secondary storage, improving performance. Remaining data blocks not stored in the CSR can be restored from secondary storage.
-
Citations
16 Claims
-
1. A method for restoring data to a client system from secondary storage, the method comprising:
-
providing one or more computer processors; performing with a media agent executing in the one or more computer processors, a secondary copy operation that copies a plurality of data blocks associated with primary storage in a client system to secondary storage located remotely from the primary storage, wherein the client system communicates with the secondary storage via a wide area network and wherein the secondary copy operation creates a secondary copy of the plurality of data blocks in the secondary storage; during performance of the secondary copy operation, creating with the media agent, for each data block of the plurality of data blocks a hash signature for each data block according to a deduplication scheme; during performance of the secondary copy operation, further copying at least a portion of the data blocks and a first copy of hash signatures associated with the data blocks to a client-side repository comprising at least computer memory, wherein client-side repository is different than the secondary storage, and wherein the client system communicates with the client-side repository over a local area network; during performance of the secondary copy operation, populating an index in communication with the media agent with a second copy of the hash signatures associated with the plurality of the data blocks stored in the secondary storage; further creating with the media agent, age information associated with the time of the creation of the secondary copy of the plurality of data blocks in the secondary storage; storing in at least computer memory, the age information about the time of creation of the secondary copy of the plurality of data blocks in secondary storage; receiving at the media agent, a request to restore data to the client system; consulting with the media agent, the age information to determine the time of the creation of the secondary copy in secondary storage of at least one data block associated with the restore data; based on the age information of the time of the creation of the secondary copy in secondary storage of the at least one data block associated with the restore data, deciding with the media agent whether to query the client-side repository remote from the secondary storage as to whether the client-side repository is populated with a copy of the at least one data block associated with the restore data; in response to determining that the age of creation of the secondary copy in secondary storage of the at least one data block associated with the restore data satisfies a threshold age, querying the client-side repository with the second copy of the hash signature from the index to determine whether the first copy of the hash signature is stored in the client-side repository; receiving an answer from the client-side repository indicating a result of the query; and in response to the answer, accessing the at least one data block associated with the restore data from secondary storage for transmission to the client system when the result indicates that the client-side repository is not populated with the first hash signature, wherein the at least one data block associated with the restore data is restored from the client-side repository to an information store of the client system via the local area network when the client-side repository is populated with the first hash signature; and in response to determining that the age of creation of the secondary copy in secondary storage of the at least one data block associated with the restore data does not satisfy the threshold age, restoring the at least one data block from secondary storage for transmission to the client system via the wide area network. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A storage system, comprising:
-
one or more computer processors; primary storage in a client system wherein the primary storage stores a plurality of data blocks created by the client system; secondary storage located remotely from the primary storage and is in communication with the client system via a wide area network, the secondary storage storing in response to a secondary copy operation, a secondary copy of the plurality of data blocks and according to a deduplication scheme that creates a hash signature for each of the data blocks; a client-side repository comprising at least computer memory that is different than secondary storage and is in communication with the client system via a local area network, the client-side repository stores copies of at least a portion the data blocks copied to secondary storage and stores a first copy of the hash signatures associated with the portion of the data blocks; a media agent executing in the one or more computer processors, the media agent having an index comprising at least computer memory, the index storing at least a second copy of the hash signatures associated with the plurality of the data blocks stored in the secondary storage, the index further storing age information associated with the time of the creation of the secondary copy of the plurality of data blocks in the secondary storage; in response to receiving a request to restore data to the client system, the media agent further configured to; consult the age information to determine the time of the creation of the secondary copy in secondary storage of at least one data block associated with the restore data; based on the age information of the time of the creation of the secondary copy in secondary storage of the at least one data block associated with the restore data, determine whether to query a client-side repository remote from the secondary storage as to whether the client-side repository is populated with a copy of the at least one data block associated with the restore data; in response to a determination that the age of creation of the secondary copy in secondary storage of the at least one data block satisfies a threshold age, query the client-side repository with the second copy of the hash signature from the index to determine whether the first copy of the hash signature is stored in the client-side repository; receive an answer from the client-side repository indicating a result of the query; in response to the answer, access the at least one data block associated with the restore data from secondary storage for transmission to the client system when the result indicates that the client-side repository is not populated with the first hash signature, wherein the data block is restored from the client-side repository to an information store of the client system via the local area network when the client-side repository is populated with the first hash signature; and in response to a determination that the age of creation of the secondary copy in secondary storage of the at least one data block associated with the restore data does not satisfy the threshold age, restoring the at least one data block from secondary storage for transmission to the client system via the wide area network. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
Specification