Multi stream deduplicated backup of collaboration server data
First Claim
Patent Images
1. A method of backing up data, comprising:
- receiving an indication to begin backup of a collaboration server dataset;
walking an associated directory in a prescribed order to divide the dataset into a prescribe number of approximately equal-sized subsets, wherein the directory comprises a plurality of files; and
using a separate subset-specific thread to back up the subsets in parallel;
wherein each subset-specific backup thread is configured to provide data included in that subset to a backup-thread specific de-duplicating backup process instance configured to perform de-duplication processing with respect to the subset and a corresponding subset associated with a prior backup; and
wherein the corresponding subset was determined by walking the associated directory in the prescribed order at a prior time with which the prior backup is associated.
9 Assignments
0 Petitions
Accused Products
Abstract
Techniques to backup collaboration server data are disclosed. An indication to begin backup of a collaboration server dataset is received. An associated directory is walked in a prescribed order to divide the dataset into a prescribe number of approximately equal-sized subsets. A separate subset-specific thread is used to back up the subsets in parallel. In some embodiments in which the collaboration data is stored in multiple volumes, a volume-based approach is used to back up the volumes in parallel, e.g., one volume per thread. In some embodiments, transaction logs are backed up in parallel with volumes of collaboration data.
26 Citations
19 Claims
-
1. A method of backing up data, comprising:
-
receiving an indication to begin backup of a collaboration server dataset; walking an associated directory in a prescribed order to divide the dataset into a prescribe number of approximately equal-sized subsets, wherein the directory comprises a plurality of files; and using a separate subset-specific thread to back up the subsets in parallel; wherein each subset-specific backup thread is configured to provide data included in that subset to a backup-thread specific de-duplicating backup process instance configured to perform de-duplication processing with respect to the subset and a corresponding subset associated with a prior backup; and
wherein the corresponding subset was determined by walking the associated directory in the prescribed order at a prior time with which the prior backup is associated.- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system to back up collaboration server data, comprising:
-
a memory of other storage device configured to store a directory associated with a collaboration server dataset; and a processor configured to; receive an indication to begin backup of the collaboration server dataset; walk the directory in a prescribed order to divide the dataset into a prescribe number of approximately equal-sized subsets, wherein the directory comprises a plurality of files; and use a separate subset-specific thread to back up the subsets in parallel; wherein each subset-specific backup thread is configured to provide data included in that subset to a backup-thread specific de-duplicating backup process instance configured to perform de-duplication processing with respect to the subset and a corresponding subset associated with a prior backup; and
wherein the corresponding subset was determined by walking the associated directory in the prescribed order at a prior time with which the prior backup is associated.- View Dependent Claims (11, 12)
-
-
13. A computer program product to back up data, the computer program product being embodied in a non-transitory computer readable storage medium and comprising computer instructions for:
-
receiving an indication to begin backup of a collaboration server dataset; walking an associated directory in a prescribed order to divide the dataset into a prescribe number of approximately equal-sized subsets, wherein the directory comprises a plurality of files; and using a separate subset-specific thread to back up the subsets in parallel; wherein each subset-specific backup thread is configured to provide data included in that subset to a backup-thread specific de-duplicating backup process instance configured to perform de-duplication processing with respect to the subset and a corresponding subset associated with a prior backup; and
wherein the corresponding subset was determined by walking the associated directory in the prescribed order at a prior time with which the prior backup is associated.
-
-
14. A method of backing up data, comprising:
-
receiving an indication to begin backup of a collaboration server dataset that includes two or more volumes of user data; using a separate volume-specific thread to back up each of at least a subset of the volumes in parallel, including by providing for each volume-specific thread a corresponding volume-specific local cache of hash values associated with data that has been backed up previously by a process associated with that volume-specific thread, wherein the volume-specific local cache is located at the collaboration server; wherein a filename of the volume-specific local cache includes volume identifying information that identifies a volume with which it is associated; and
wherein the volume identifying information included in the filename the volume-specific local cache is used to ensure that in a subsequent backup of the volume the same local cache is associated with the de-duplicating backup process instance used to backup that volume. - View Dependent Claims (15, 16, 17)
-
-
18. A system to backup collaboration data, comprising:
a processor configured to; receive an indication to begin backup of a collaboration server dataset that includes two or more volumes of user data; and use a separate volume-specific thread to back up each of at least a subset of the volumes in parallel, including by providing for each volume-specific thread a corresponding volume-specific local cache of hash values associated with data that has been backed up previously by a process associated with that volume-specific thread, wherein the volume-specific local cache is located at the collaboration server; and a memory or other storage device coupled to the processor and configured to store said corresponding volume-specific local caches; wherein a filename of the volume-specific local cache includes volume identifying information that identifies a volume with which it is associated; and
wherein the volume identifying information included in the filename the volume-specific local cache is used to ensure that in a subsequent backup of the volume the same local cache is associated with the de-duplicating backup process instance used to backup that volume.
-
19. A computer program product to back up data, the computer program product being embodied in a non-transitory computer readable storage medium and comprising computer instructions for:
-
receiving an indication to begin backup of a collaboration server dataset that includes two or more volumes of user data; using a separate volume-specific thread to back up each of at least a subset of the volumes in parallel, including by providing for each volume-specific thread a corresponding volume-specific local cache of hash values associated with data that has been backed up previously by a process associated with that volume-specific thread, wherein the volume-specific local cache is located at the collaboration server; wherein a filename of the volume-specific local cache includes volume identifying information that identifies a volume with which it is associated; and
wherein the volume identifying information included in the filename the volume-specific local cache is used to ensure that in a subsequent backup of the volume the same local cache is associated with the de-duplicating backup process instance used to backup that volume.
-
Specification