Systems and methods for performing data replication

US 8,489,656 B2
Filed: 05/27/2011
Issued: 07/16/2013
Est. Priority Date: 05/28/2010
Status: Active Grant

First Claim

Patent Images

1. A method for identifying data to be copied in a data replication system, the method comprising:

obtaining with a scanning module executing on a computing device a first file identifier descriptor (FID) of a first directory on a live source file system, the first FID being one of a plurality of unique identifiers corresponding to a plurality of directories and files on the source file system;

adding the first FID to a queue;

storing a current journal sequence number from a file system filter driver identifying a first time;

following said storing, accessing a current directory of the plurality of directories on the source file system that corresponds to a next FID stored in the queue;

obtaining additional FIDs for each immediate child directory and immediate child file in the current directory;

if no changes have been made to the current directory since the first time,populating a file name database with the additional FIDs of each immediate child directory and immediate child file in the current directory,adding the additional FIDs of each immediate child directory of the current directory to the queue, andremoving the next FID from the queue; and

if changes have been made to the first directory since the first time, repeating said storing, said accessing and said obtaining the additional FIDs.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Preparing source data to be replicated in a continuous data replication environment. Certain systems and methods populate a file name database with entries having a unique file identifier descriptor (FID), short name and a FID of the parent directory of each directory or file on a source storage device. Such information is advantageously gathered during scanning of a live file system without requiring a snapshot of the source storage device. The database can be further used to generate absolute file names associated with data operations to be replayed on a destination storage device. Based on the obtained FIDs, certain embodiments can further combine write operations to be replayed on the destination storage device and/or avoid replicating temporary files to the destination system.

Citations

17 Claims

1. A method for identifying data to be copied in a data replication system, the method comprising:
- obtaining with a scanning module executing on a computing device a first file identifier descriptor (FID) of a first directory on a live source file system, the first FID being one of a plurality of unique identifiers corresponding to a plurality of directories and files on the source file system;
  
  adding the first FID to a queue;
  
  storing a current journal sequence number from a file system filter driver identifying a first time;
  
  following said storing, accessing a current directory of the plurality of directories on the source file system that corresponds to a next FID stored in the queue;
  
  obtaining additional FIDs for each immediate child directory and immediate child file in the current directory;
  
  if no changes have been made to the current directory since the first time,populating a file name database with the additional FIDs of each immediate child directory and immediate child file in the current directory,adding the additional FIDs of each immediate child directory of the current directory to the queue, andremoving the next FID from the queue; and
  
  if changes have been made to the first directory since the first time, repeating said storing, said accessing and said obtaining the additional FIDs.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein said obtaining the first FID is performed without performing a snapshot on the source file system.
  - 3. The method of claim 1, wherein the first directory is the current directory.
  - 4. The method of claim 1, additionally comprising repeating said storing, said accessing and said obtaining the additional FIDs for each FID stored in the queue.
  - 5. The method of claim 1, wherein said populating the file name database comprises for each immediate child directory and immediate child file in the current directory and storing in the file name database:
    - the additional FID for the immediate child directory or immediate child file;
      
      a corresponding short name for the immediate child directory or immediate child file; and
      
      the next FID as a parent directory of the immediate child directory or immediate child file.
  - 6. The method of claim 1, wherein said changes comprise namespace changes to the current directory.
  - 7. The method of claim 1, wherein the first directory comprises a root directory of the live source file system.
  - 8. The method of claim 1, additionally comprising monitoring at least one data management operation directed to first data stored in the source file system.
  - 9. The method of claim 8, additionally comprising replaying the at least one data management operation on replication data stored on a destination file system.
  - 10. The method of claim 9, additionally comprising:
    - constructing, from information populated in the file name database, an absolute file name that corresponds to the location of the first data on the source file system; and
      
      transmitting the absolute file name to the destination system to direct said replaying of the at least one data management operation.

11. A system for preparing data for replication from a source computing device in a network, the system comprising:
- a queue configured to store a plurality of file identifier descriptors (FIDs) each comprising a unique identifier that corresponds to one of a plurality of directories and files on a source file system;
  
  a scanning module executing on a computing device and configured to scan the source file system while in a live state and to populate the queue with the plurality of FIDs;
  
  a database comprising file name data that associates each of the plurality of FIDs with a short name and a parent FID, wherein the scanning module is further configured to populate the database with the file name data based on said scan of the source file system in the live state; and
  
  at least one database thread configured to receive a data entry identifying a data management operation associated with at least one of the plurality of directories and files on the source file system and to construct from the FID associated with the at least one directory or file an absolute file name for transmission to a destination system along with a copy of the data management operation for replying on the destination system.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The system of claim 11, wherein the scanning module is further configured to:
    - access a current directory of the plurality of directories on the source file system that corresponds to a next FID in the queue; and
      
      obtain additional FIDs for each immediate child directory and immediate child file in the current directory.
  - 13. The system of claim 12, wherein the scanning module is further configured to:
    - populate the file name database with the additional FIDs of each immediate child directory and immediate child file in the current directory, andadd the additional FIDs of each immediate child directory of the current directory to the queue.
  - 14. The system of claim 11, further comprising a filter driver situated between the source file system and at least one application configured to request the data management operation.
  - 15. The system of claim 14, wherein the filter driver is further configured to assign journal sequence numbers to each journal entry associated with a requested change to the source file system.
  - 16. The system of claim 15, wherein the scanning module is further configured to receive a current journal sequence number from the filter driver prior to accessing the current directory.
  - 17. The system of claim 16, wherein the scanning module is configured to repeat said accessing and obtaining when changes are detected to the current directory following a time of the current journal sequence number but prior to said obtaining.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
CommVault Systems Incorporated
Original Assignee
CommVault Systems Incorporated
Inventors
Erofeev, Andrei
Primary Examiner(s)
LEWIS, CHERYL RENEA

Application Number

US13/118,250
Publication Number

US 20110295805A1
Time in Patent Office

781 Days
Field of Search

707/615, 707/634, 707/636, 707/692, 707/812, 707/828
US Class Current

707/828
CPC Class Codes

G06F 11/1435   using file system or storag...

G06F 11/1458   Management of the backup or...

G06F 11/1471   involving logging of persis...

G06F 11/1662   the resynchronized componen...

G06F 11/2094   Redundant storage or storag...

G06F 16/1734   Details of monitoring file ...

G06F 16/1844   Management specifically ada...

G06F 2201/84   Using snapshots, i.e. a log...

H04L 67/1095   Replication or mirroring of...

Systems and methods for performing data replication

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for performing data replication

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links