Stubbing systems and methods in a data replication environment

US 9,002,785 B2
Filed: 07/31/2013
Issued: 04/07/2015
Est. Priority Date: 03/30/2010
Status: Active Grant

First Claim

Patent Images

1. A method of managing the storage of data in a computer network, the method comprising:

receiving data from a source system comprising computer hardware at a destination system comprising computer hardware including one or more processors, the data comprising at least one first stub file comprising information indicative of a location of additional data on a storage device;

storing the data at a destination storage device included with the destination system, wherein the destination storage device differs from the storage device;

identifying a first portion of the data for storage at a secondary storage device, wherein the secondary storage device differs from the storage device and the destination storage device;

providing a copy of the first portion of the data for storage in the secondary storage device;

generating a second stub file representative of the first portion of the data;

replacing the first portion of the data at the destination storage device with the second stub file; and

tagging the second stub file with an identifier that distinguishes stub files generated at the destination system from stub files received with the data from the source system, wherein the second stub file comprises information indicative of a location of the copy of the first portion of the data on the secondary storage device, and wherein the at least one first stub file is not tagged with the identifier.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Stubbing systems and methods are provided for intelligent data management in a replication environment, such as by reducing the space occupied by replication data on a destination system. In certain examples, stub files or like objects replace migrated, de-duplicated or otherwise copied data that has been moved from the destination system to secondary storage. Access is further provided to the replication data in a manner that is transparent to the user and/or without substantially impacting the base replication process. In order to distinguish stub files representing migrated replication data from replicated stub files, priority tags or like identifiers can be used. Thus, when accessing a stub file on the destination system, such as to modify replication data or perform a restore process, the tagged stub files can be used to recall archived data prior to performing the requested operation so that an accurate copy of the source data is generated.

Citations

18 Claims

1. A method of managing the storage of data in a computer network, the method comprising:
- receiving data from a source system comprising computer hardware at a destination system comprising computer hardware including one or more processors, the data comprising at least one first stub file comprising information indicative of a location of additional data on a storage device;
  
  storing the data at a destination storage device included with the destination system, wherein the destination storage device differs from the storage device;
  
  identifying a first portion of the data for storage at a secondary storage device, wherein the secondary storage device differs from the storage device and the destination storage device;
  
  providing a copy of the first portion of the data for storage in the secondary storage device;
  
  generating a second stub file representative of the first portion of the data;
  
  replacing the first portion of the data at the destination storage device with the second stub file; and
  
  tagging the second stub file with an identifier that distinguishes stub files generated at the destination system from stub files received with the data from the source system, wherein the second stub file comprises information indicative of a location of the copy of the first portion of the data on the secondary storage device, and wherein the at least one first stub file is not tagged with the identifier.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the first portion of the data comprises a data object that is included by multiple portions of the data.
  - 3. The method of claim 2, further comprising:
    - identifying a second portion of the data comprising a copy of the data object; and
      
      replacing the second portion of the data with the second stub file without providing a copy of the second portion of the data to the secondary storage device.
  - 4. The method of claim 1, wherein the first portion of the data comprises data that has been designated for archival.
  - 5. The method of claim 1, further comprising using the second stub file to access the copy of the first portion of the data from the secondary storage device during a restore operation.
  - 6. The method of claim 1, further comprising maintaining an index for managing identifiers associated with stub files, the index including the identifier associated with the second stub file.
  - 7. The method of claim 1, further comprising determining whether the first portion of the data has a last access time at or before the time of creation of the copy of the first portion of the data, wherein said replacing the first portion of the data at the destination storage device with the second stub file occurs in response to determining that the last access time is at or before the time of creation of the copy of the first portion of the data.

8. A system for managing the storage of data in a computer network, the system comprising:
- computer hardware configured to execute a data manager, the data manager configured to;
  
  scan data located at a destination storage device to identify a first portion of the data to store at a secondary storage device, the data comprising at least a first stub file indicative of a portion of the data stored at a storage device that differs from the destination storage device and the secondary storage device;
  
  provide a copy of the first portion of the data for storage in the secondary storage device;
  
  generate a second stub file representative of the first portion of the data;
  
  replace the first portion of the data at the destination storage device with the second stub file indicative of a location of the copy of the first portion of the data on the secondary storage device; and
  
  tag the second stub file with an identifier that distinguishes stub files generated by the data manager from stub files indicative of portions of the data stored at the storage device, wherein the first stub file is not tagged with the identifier.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 9. The system of claim 8, wherein the first portion of the data comprises a data object that is included by multiple portions of the data.
  - 10. The system of claim 9, wherein the data manager is further configured to:
    - scan the data located at the destination storage device to identify a second portion of the data comprising the data object; and
      
      replace the second portion of the data with the second stub file.
  - 11. The system of claim 8, wherein the first portion of the data comprises a file.
  - 12. The system of claim 8, wherein the data manager is further configured to use the second stub file to access the copy of the first portion of the data from the secondary storage device during a restore operation.
  - 13. The system of claim 8, wherein the first portion of data satisfies a particular data size.
  - 14. The system of claim 13, wherein the particular data size is 64 KB.
  - 15. The system of claim 8, further comprising the destination storage device storing the data, wherein the data is provided to the destination storage device by a source system comprising computer hardware.
  - 16. The system of claim 8, wherein the data manager is further configured to maintain an index for managing identifiers associated with stub files, wherein each respective stub file includes an identifier used to distinguish the respective stub file from other ones of the stub files.
  - 17. The system of claim 8, wherein the data manager is further configured to determine whether the first portion of the data has a last access time at or before the time of creation of the copy of the first portion of data, and wherein the data manager replaces the first portion of the data with the second stub file in response to determining that the last access time is at or before the time of creation of the copy of the first portion of the data.

18. Non-transitory computer storage comprising instructions which, when executed, cause the computing system to perform steps comprising:
- receiving data at a first system comprising computer hardware including one or more processors from a second system comprising computer hardware including one or more processors, the data comprising a first stub file comprising information indicative of a location of additional data on a separate storage device, wherein the separate storage device is not included with the first system;
  
  storing the data at a first storage device included with the first system;
  
  identifying a first portion of the data for storage at a second storage device, wherein the second storage device differs from the separate storage;
  
  providing a copy of the first portion of the data for storage in the second storage device;
  
  generating a second stub file representative of the first portion of the data;
  
  replacing the first portion of the data at the first storage device with the second stub file; and
  
  tagging the second stub file with an identifier that distinguishes stub files generated at the first system from stub files received with the data from the second system, wherein the second stub file comprises information indicative of a location of the copy of the first portion of the data on the second storage device, and wherein the first stub file is not tagged with the identifier.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
CommVault Systems Incorporated
Original Assignee
CommVault Systems Incorporated
Inventors
Prahlad, Anand, Agrawal, Vijay H.
Primary Examiner(s)
Ruiz, Angelica

Application Number

US13/955,445
Publication Number

US 20140067764A1
Time in Patent Office

615 Days
Field of Search

707600-831, 707/899, 707999001-999206
US Class Current

707/609
CPC Class Codes

G06F 16/1734   Details of monitoring file ...

G06F 16/1748   De-duplication implemented ...

G06F 16/182   Distributed file systems

G06F 16/22   Indexing; Data structures t...

G06F 16/81   Indexing, e.g. XML tags; Da...

G06F 16/951   Indexing; Web crawling tech...

G06F 2201/80   Database-specific techniques

Stubbing systems and methods in a data replication environment

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Stubbing systems and methods in a data replication environment

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links