Stubbing systems and methods in a data replication environment
First Claim
1. A method for performing data management operations in a computer network, the method comprising:
- monitoring operations associated with a source computing device, the operations operative to write data to a source storage device;
copying the data to a destination storage device based at least in part on the operations, the data comprising at least one first stub file, said copying to the destination storage device comprising processing at least one log file having a plurality of log entries indicative of the operations to replay the operations on the destination storage device;
with one or more computer processors, scanning the data of the destination storage device to identify a common data object repeated between multiple portions of the data on the destination storage device;
creating a copy of the common data object on a secondary storage device;
determining a last access time of each of the multiple data portions of the destination storage device having the common data object;
for each of the multiple data portions having a last access time at or before the time of the creation of the copy of the common data object, replacing the common data object of the particular data portion with a second stub file, wherein the second stub file comprises a tag value not possessed by, and used to distinguish the second stub file from, any of the at least one first stub file, and wherein the second stub file comprises information indicative of a location of the copy of the common data object on the secondary storage device; and
wherein the second stub file is used to recall the copy of the common data object from the secondary storage device during a restore operation.
4 Assignments
0 Petitions
Accused Products
Abstract
Stubbing systems and methods are provided for intelligent data management in a replication environment, such as by reducing the space occupied by replication data on a destination system. In certain examples, stub files or like objects replace migrated, de-duplicated or otherwise copied data that has been moved from the destination system to secondary storage. Access is further provided to the replication data in a manner that is transparent to the user and/or without substantially impacting the base replication process. In order to distinguish stub files representing migrated replication data from replicated stub files, priority tags or like identifiers can be used. Thus, when accessing a stub file on the destination system, such as to modify replication data or perform a restore process, the tagged stub files can be used to recall archived data prior to performing the requested operation so that an accurate copy of the source data is generated.
-
Citations
15 Claims
-
1. A method for performing data management operations in a computer network, the method comprising:
-
monitoring operations associated with a source computing device, the operations operative to write data to a source storage device; copying the data to a destination storage device based at least in part on the operations, the data comprising at least one first stub file, said copying to the destination storage device comprising processing at least one log file having a plurality of log entries indicative of the operations to replay the operations on the destination storage device; with one or more computer processors, scanning the data of the destination storage device to identify a common data object repeated between multiple portions of the data on the destination storage device; creating a copy of the common data object on a secondary storage device; determining a last access time of each of the multiple data portions of the destination storage device having the common data object; for each of the multiple data portions having a last access time at or before the time of the creation of the copy of the common data object, replacing the common data object of the particular data portion with a second stub file, wherein the second stub file comprises a tag value not possessed by, and used to distinguish the second stub file from, any of the at least one first stub file, and wherein the second stub file comprises information indicative of a location of the copy of the common data object on the secondary storage device; and wherein the second stub file is used to recall the copy of the common data object from the secondary storage device during a restore operation. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for performing data management operations, comprising:
-
a destination storage device storing data that comprises at least one stub file, the data copied to the destination storage device from a source storage device and corresponding to data write operations associated with a source computing device, wherein the data is copied to the destination storage device at least partly by processing at least one log file having a plurality of log entries indicative of the operations to replay the operations on the destination storage device; an archiving module executing in one or more computer processors and configured to; scan data of the destination storage device to identify a common data object repeated between multiple portions of the data on the destination storage device; create a copy of the common data object on a secondary storage device; determine a last access time of each of the multiple data portions of the destination storage device having the common data object; for ones of the multiple data portions having a last access time at or before the time of the creation of the copy of the common data object, replace the common data object of the particular data portion with a second stub file, wherein the second stub file comprises a tag value not possessed by, and used to distinguish the second stub file from, any of the at least one first stub file, and wherein the second stub file comprises information indicative of a location of the copy of the common data object on the secondary storage device, wherein the second stub file is used to recall the copy of the common data object from the secondary storage device during a restore operation. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer readable medium configured to store software code that is readable by a computing system, wherein the software code is executable on the computing system in order to cause the computing system to perform steps comprising:
-
monitoring operations associated with a source computing device, the operations operative to write data to a source storage device; copying the data to a destination storage device based at least in part on the operations, the data comprising at least one first stub file, said copying to the destination storage device comprising processing at least one log file having a plurality of log entries indicative of the operations to replay the operations on the destination storage device; with one or more computer processors, scanning the data of the destination storage device to identify a common data object repeated between multiple portions of the data on the destination storage device; creating a copy of the common data object on a secondary storage device; determining a last access time of each of the multiple data portions of the destination storage device having the common data object; for each of the multiple data portions having a last access time at or before the time of creation of the copy of the common data object, replacing the common data object of the particular data portion with a second stub file, wherein the second stub file comprises a tag value not possessed by, and used to distinguish the second stub file from, any of the at least one first stub file, and wherein the second stub file comprises information indicative of a location of the copy of the common data object on the secondary storage device; and wherein the second stub file is used to recall the copy of the common data object from the secondary storage device during a restore operation.
-
Specification