Multi stream deduplicated backup of collaboration server data

US 9,165,001 B1
Filed: 12/19/2012
Issued: 10/20/2015
Est. Priority Date: 12/19/2012
Status: Active Grant

First Claim

Patent Images

1. A method of backing up data, comprising:

receiving an indication to begin backup of a collaboration server dataset;

walking an associated directory in a prescribed order to divide the dataset into a prescribe number of approximately equal-sized subsets, wherein the directory comprises a plurality of files; and

using a separate subset-specific thread to back up the subsets in parallel;

wherein each subset-specific backup thread is configured to provide data included in that subset to a backup-thread specific de-duplicating backup process instance configured to perform de-duplication processing with respect to the subset and a corresponding subset associated with a prior backup; and

wherein the corresponding subset was determined by walking the associated directory in the prescribed order at a prior time with which the prior backup is associated.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques to backup collaboration server data are disclosed. An indication to begin backup of a collaboration server dataset is received. An associated directory is walked in a prescribed order to divide the dataset into a prescribe number of approximately equal-sized subsets. A separate subset-specific thread is used to back up the subsets in parallel. In some embodiments in which the collaboration data is stored in multiple volumes, a volume-based approach is used to back up the volumes in parallel, e.g., one volume per thread. In some embodiments, transaction logs are backed up in parallel with volumes of collaboration data.

26 Citations

View as Search Results

19 Claims

1. A method of backing up data, comprising:
- receiving an indication to begin backup of a collaboration server dataset;
  
  walking an associated directory in a prescribed order to divide the dataset into a prescribe number of approximately equal-sized subsets, wherein the directory comprises a plurality of files; and
  
  using a separate subset-specific thread to back up the subsets in parallel;
  
  wherein each subset-specific backup thread is configured to provide data included in that subset to a backup-thread specific de-duplicating backup process instance configured to perform de-duplication processing with respect to the subset and a corresponding subset associated with a prior backup; and
  
  wherein the corresponding subset was determined by walking the associated directory in the prescribed order at a prior time with which the prior backup is associated.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the prescribed number of subsets corresponds to a configured maximum number of concurrent backup threads.
  - 3. The method of claim 1, wherein the associated directory is walked in the same prescribed order in a subsequent backup operation.
  - 4. The method claim 1, further comprising generating for each subset a corresponding list of files.
  - 5. The method of claim 4, further comprising steaming each corresponding list of files to a corresponding subset-specific backup thread.
  - 6. The method of claim 1, wherein each backup-thread specific de-duplicating backup process instance has associated therewith a corresponding de-duplicating backup process instance-specific local cache.
  - 7. The method of claim 6, wherein each backup-thread specific de-duplicating backup process instance is configured to store in its associated corresponding de-duplicating backup process instance-specific local cache a hash value representative of data that has been transferred by that backup-thread specific de-duplicating backup process instance to a remote backup server.
  - 8. The method of claim 1, wherein each backup-thread specific de-duplicating backup process uses a corresponding local cache located at the collaboration server.
  - 9. The method of claim 8, wherein each backup-thread specific de-duplicating backup process determines whether the corresponding local cache stores a file included in the subset corresponding to the subset-specific backup thread.

10. A system to back up collaboration server data, comprising:
- a memory of other storage device configured to store a directory associated with a collaboration server dataset; and
  
  a processor configured to;
  
  receive an indication to begin backup of the collaboration server dataset;
  
  walk the directory in a prescribed order to divide the dataset into a prescribe number of approximately equal-sized subsets, wherein the directory comprises a plurality of files; and
  
  use a separate subset-specific thread to back up the subsets in parallel;
  
  wherein each subset-specific backup thread is configured to provide data included in that subset to a backup-thread specific de-duplicating backup process instance configured to perform de-duplication processing with respect to the subset and a corresponding subset associated with a prior backup; and
  
  wherein the corresponding subset was determined by walking the associated directory in the prescribed order at a prior time with which the prior backup is associated.
- View Dependent Claims (11, 12)
- - 11. The system of claim 10, wherein the prescribed number of subsets corresponds to a configured maximum number of concurrent backup threads.
  - 12. The system of claim 10, wherein the processor is further configured to generate for each subset a corresponding list of files and to stream each corresponding list of files to a corresponding subset-specific backup thread with which that list of files is associated.

13. A computer program product to back up data, the computer program product being embodied in a non-transitory computer readable storage medium and comprising computer instructions for:
- receiving an indication to begin backup of a collaboration server dataset;
  
  walking an associated directory in a prescribed order to divide the dataset into a prescribe number of approximately equal-sized subsets, wherein the directory comprises a plurality of files; and
  
  using a separate subset-specific thread to back up the subsets in parallel;
  
  wherein each subset-specific backup thread is configured to provide data included in that subset to a backup-thread specific de-duplicating backup process instance configured to perform de-duplication processing with respect to the subset and a corresponding subset associated with a prior backup; and
  
  wherein the corresponding subset was determined by walking the associated directory in the prescribed order at a prior time with which the prior backup is associated.

14. A method of backing up data, comprising:
- receiving an indication to begin backup of a collaboration server dataset that includes two or more volumes of user data;
  
  using a separate volume-specific thread to back up each of at least a subset of the volumes in parallel, including by providing for each volume-specific thread a corresponding volume-specific local cache of hash values associated with data that has been backed up previously by a process associated with that volume-specific thread, wherein the volume-specific local cache is located at the collaboration server;
  
  wherein a filename of the volume-specific local cache includes volume identifying information that identifies a volume with which it is associated; and
  
  wherein the volume identifying information included in the filename the volume-specific local cache is used to ensure that in a subsequent backup of the volume the same local cache is associated with the de-duplicating backup process instance used to backup that volume.
- View Dependent Claims (15, 16, 17)
- - 15. The method of claim 14, wherein a separate volume-specific thread is spawned to back up each volume up to a configured maximum number of concurrent backup threads.
  - 16. The method of claim 14, further comprising backing up one or more transaction logs in parallel with the backup of said at least a subset of the volumes.
  - 17. The method of claim 16, wherein backing up said one or more transaction logs includes performing a first pass backup of said one or more transaction logs during backup of said at least a subset of the volumes;
    - receiving an indication that the first pass and the backup of said at least a subset of the volumes have been completed; and
      
      based at least in part on the indication performing a second pass backup of said one or more transaction logs.

18. A system to backup collaboration data, comprising:
- a processor configured to;
  
  receive an indication to begin backup of a collaboration server dataset that includes two or more volumes of user data; and
  
  use a separate volume-specific thread to back up each of at least a subset of the volumes in parallel, including by providing for each volume-specific thread a corresponding volume-specific local cache of hash values associated with data that has been backed up previously by a process associated with that volume-specific thread, wherein the volume-specific local cache is located at the collaboration server; and
  
  a memory or other storage device coupled to the processor and configured to store said corresponding volume-specific local caches;
  
  wherein a filename of the volume-specific local cache includes volume identifying information that identifies a volume with which it is associated; and
  
  wherein the volume identifying information included in the filename the volume-specific local cache is used to ensure that in a subsequent backup of the volume the same local cache is associated with the de-duplicating backup process instance used to backup that volume.

19. A computer program product to back up data, the computer program product being embodied in a non-transitory computer readable storage medium and comprising computer instructions for:
- receiving an indication to begin backup of a collaboration server dataset that includes two or more volumes of user data;
  
  using a separate volume-specific thread to back up each of at least a subset of the volumes in parallel, including by providing for each volume-specific thread a corresponding volume-specific local cache of hash values associated with data that has been backed up previously by a process associated with that volume-specific thread, wherein the volume-specific local cache is located at the collaboration server;
  
  wherein a filename of the volume-specific local cache includes volume identifying information that identifies a volume with which it is associated; and
  
  wherein the volume identifying information included in the filename the volume-specific local cache is used to ensure that in a subsequent backup of the volume the same local cache is associated with the de-duplicating backup process instance used to backup that volume.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Emc IP Holding Company LLC (Dell Technologies Inc.)
Original Assignee
EMC Corporation (Dell Technologies Inc.)
Inventors
Upadhyay, Navneet, Tadahal, Manjunath
Primary Examiner(s)
Perveen, Rehana
Assistant Examiner(s)
Skhoun, Hicham

Application Number

US13/720,814
Time in Patent Office

1,035 Days
Field of Search

707/652
US Class Current

1/1
CPC Class Codes

G06F 11/1453   using de-duplication of the...

G06F 11/1458   Management of the backup or...

G06F 11/1464   for networked environments

G06F 16/119   Details of migration of fil...

G06F 16/172   Caching, prefetching or hoa...

G06F 16/1748   De-duplication implemented ...

G06F 16/27   Replication, distribution o...

G06F 2201/84   Using snapshots, i.e. a log...

G06F 2201/845   Systems in which the redund...

Multi stream deduplicated backup of collaboration server data

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

26 Citations

19 Claims

Specification

Use Cases

Quick Links

Others

Multi stream deduplicated backup of collaboration server data

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

26 Citations

19 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others