Systems and methods for performing data replication

US 8,745,105 B2
Filed: 09/26/2013
Issued: 06/03/2014
Est. Priority Date: 05/28/2010
Status: Active Grant

First Claim

Patent Images

1. A method for identifying data to be copied in a data replication system, the method comprising:

using a computing device, adding a first file identifier descriptor (FID) of a first directory on a live source file system to a queue, the first FID being one of a plurality of unique identifiers corresponding to a plurality of directories and files on the source file system;

storing a current journal sequence number from a file system filter driver identifying a first time;

following said storing, accessing a current directory of the plurality of directories on the source file system that corresponds to a next FID stored in the queue;

obtaining additional FIDs for each immediate child directory and immediate child file in the current directory; and

in response to determining that no changes have been made to the current directory since the first time;

populating a file name database with the additional FIDs of each immediate child directory and immediate child file in the current directory;

adding the additional FIDs of each immediate child directory of the current directory to the queue; and

removing the next FID from the queue.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Preparing source data to be replicated in a continuous data replication environment. Certain systems and methods populate a file name database with entries having a unique file identifier descriptor (FID), short name and a FID of the parent directory of each directory or file on a source storage device. Such information is advantageously gathered during scanning of a live file system without requiring a snapshot of the source storage device. The database can be further used to generate absolute file names associated with data operations to be replayed on a destination storage device. Based on the obtained FIDs, certain embodiments can further combine write operations to be replayed on the destination storage device and/or avoid replicating temporary files to the destination system.

697 Citations

20 Claims

1. A method for identifying data to be copied in a data replication system, the method comprising:
- using a computing device, adding a first file identifier descriptor (FID) of a first directory on a live source file system to a queue, the first FID being one of a plurality of unique identifiers corresponding to a plurality of directories and files on the source file system;
  
  storing a current journal sequence number from a file system filter driver identifying a first time;
  
  following said storing, accessing a current directory of the plurality of directories on the source file system that corresponds to a next FID stored in the queue;
  
  obtaining additional FIDs for each immediate child directory and immediate child file in the current directory; and
  
  in response to determining that no changes have been made to the current directory since the first time;
  
  populating a file name database with the additional FIDs of each immediate child directory and immediate child file in the current directory;
  
  adding the additional FIDs of each immediate child directory of the current directory to the queue; and
  
  removing the next FID from the queue.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein:
    - in response to determining that changes have been made to the first directory since the first time, said storing, said accessing and said obtaining the additional FIDs are repeated.
  - 3. The method of claim 1, wherein the first FID obtained by a scanning module executing on a computing device.
  - 4. The method of claim 1, wherein the first FID is obtained without performing a snapshot on the source file system.
  - 5. The method of claim 1, wherein the first directory is the current directory.
  - 6. The method of claim 1, wherein said storing, said accessing and said obtaining the additional FIDs are repeated for each FID stored in the queue.
  - 7. The method of claim 1, wherein said populating the file name database comprises for each immediate child directory and immediate child file in the current directory, storing in the file name database:
    - the additional FID for the immediate child directory or immediate child file;
      
      a corresponding short name for the immediate child directory or immediate child file; and
      
      the next FID as a parent directory of the immediate child directory or immediate child file.
  - 8. The method of claim 1, wherein said changes comprise namespace changes to the current directory.
  - 9. The method of claim 1, wherein the first directory comprises a root directory of the live source file system.
  - 10. The method of claim 1, further comprising monitoring at least one data management operation directed to first data stored in the source file system.
  - 11. The method of claim 10, further comprising replaying the at least one data management operation on replication data stored on a destination file system.
  - 12. The method of claim 11, further comprising:
    - constructing, from information populated in the file name database, an absolute file name that corresponds to a location of the first data on the source file system; and
      
      transmitting the absolute file name to the destination file system to direct said replaying of the at least one data management operation.

13. A system for preparing data for replication from a source computing device in a network, the system comprising:
- one or more memory devices containing;
  
  a queue including a plurality of file identifier descriptors (FIDs) each comprising a unique identifier that corresponds to one of a plurality of directories and files on a source file system; and
  
  a database comprising file name data that associates each of the plurality of FIDs with a short name and a parent FID;
  
  a computing system comprising one or more computing devices comprising computer hardware, the computing system configured to;
  
  scan the source file system while in a live state and to populate the queue with the plurality of FIDs;
  
  access a current directory of the plurality of directories on the source file system that corresponds to a next FID in the queue; and
  
  obtain additional FIDs for each immediate child directory and immediate child file in the current directory; and
  
  populate the database with the file name data based on said scan of the source file system in the live state.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The system of claim 13, further comprising:
    - at least one database thread configured to receive a data entry identifying a data management operation associated with at least one of the plurality of directories and files on the source file system and to construct from the FID associated with the at least one directory or file an absolute file name for transmission to a destination system along with a copy of the data management operation for replaying on the destination system.
  - 15. The system of claim 14, further comprising a filter driver situated between the source file system and at least one application configured to request the data management operation.
  - 16. The system of claim 15, wherein the filter driver is further configured to assign journal sequence numbers to each journal entry associated with a requested change to the source file system.
  - 17. The system of claim 16, wherein the computing system is further configured to receive a current journal sequence number from the filter driver prior to accessing the current directory.
  - 18. The system of claim 17, wherein the computing system is further configured to repeat said accessing and obtaining in response to detecting changes to the current directory following a time of the current journal sequence number but prior to said obtaining.
  - 19. The system of claim 13, wherein the computing system is further configured to:
    - populate the file name database with the additional FIDs of each immediate child directory and immediate child file in the current directory; and
      
      add the additional FIDs of each immediate child directory of the current directory to the queue.

20. A system for identifying data to be copied in a data replication system, the system comprising:
- one or more computing devices comprising computer hardware and configured to;
  
  add a first file identifier descriptor (FID) of a first directory on a live source file system to a queue, the first FID being one of a plurality of unique identifiers corresponding to a plurality of directories and files on the source file system;
  
  store a current journal sequence number from a file system filter driver identifying a first time;
  
  following said storing, access a current directory of the plurality of directories on the source file system that corresponds to a next FID stored in the queue;
  
  obtain additional FIDs for each immediate child directory and immediate child file in the current directory; and
  
  in response to determining that no changes have been made to the current directory since the first time;
  
  populate a file name database with the additional FIDs of each immediate child directory and immediate child file in the current directory;
  
  add the additional FIDs of each immediate child directory of the current directory to the queue; and
  
  remove the next FID from the queue.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
CommVault Systems Incorporated
Original Assignee
CommVault Systems Incorporated
Inventors
Erofeev, Andrei
Primary Examiner(s)
LEWIS, CHERYL RENEA

Application Number

US14/038,540
Publication Number

US 20140032495A1
Time in Patent Office

250 Days
Field of Search

707/615, 707/634, 707/635, 707/664, 707/692, 707/705, 707/821, 707/828
US Class Current

707/828
CPC Class Codes

G06F 11/1435   using file system or storag...

G06F 11/1458   Management of the backup or...

G06F 11/1471   involving logging of persis...

G06F 11/1662   the resynchronized componen...

G06F 11/2094   Redundant storage or storag...

G06F 16/1734   Details of monitoring file ...

G06F 16/1844   Management specifically ada...

G06F 2201/84   Using snapshots, i.e. a log...

H04L 67/1095   Replication or mirroring of...

Systems and methods for performing data replication

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

697 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for performing data replication

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

697 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links