Method and apparatus to recover from interrupted data streams in a deduplication system
First Claim
1. A method comprising:
- receiving a current data stream having an associated unique identifier;
determining whether the current data stream is a first data stream associated with the unique identifier; and
allocating a segment of memory to contain a first amount of data from the current data stream, whereina size of the segment is determined using an identification of a type of first data object received in the current data stream, andthe size of the segment is further determined using a size of a last segment of a previous data stream associated with the unique identifier, if the current data stream is not the first data stream associated with the unique identifier.
7 Assignments
0 Petitions
Accused Products
Abstract
Detection and proper deduplication of a re-started data stream in a segmentation analysis-based deduplication system are provided by retaining information about a previous data stream and using that information when performing segmentation of the re-started data stream. Information such as a segment size associated with a last data object received in the previous data stream and a record of how much data was present in the last segment associated with the previous data stream is retained. The retained segment size information is used to set a first data object segment size of the re-started data stream, and the size of last segment information is used to determine how much information should be put in the first segment associated with the re-started data stream in order to maintain proper alignment of the remainder of the segments for the first data object in the re-started data stream for deduplication.
106 Citations
20 Claims
-
1. A method comprising:
-
receiving a current data stream having an associated unique identifier; determining whether the current data stream is a first data stream associated with the unique identifier; and allocating a segment of memory to contain a first amount of data from the current data stream, wherein a size of the segment is determined using an identification of a type of first data object received in the current data stream, and the size of the segment is further determined using a size of a last segment of a previous data stream associated with the unique identifier, if the current data stream is not the first data stream associated with the unique identifier. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-readable storage medium storing instructions executable by a processor, said instructions comprising:
-
a first set of instructions configured to determine whether a current data stream is a first data stream associated with a unique identifier associated with the current data stream, wherein the current data stream is received by a network interface coupled to the processor; and a second set of instructions configured to allocate a segment of memory to contain a first amount of data from the current data stream, wherein the second set of instructions comprises instructions further configured to determine a size of the segment using an identification of a type of first data object received in the current data stream, and if the current data stream is not the first data stream associated with the unique identifier, further determine the size of the segment using a size of a last segment of a previous data stream associated with the unique identifier. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. An apparatus comprising:
-
a network interface configured to receive from a remote node a current data stream having an associated unique identifier; a segment buffer memory, coupled to the network interface, and comprising memory allocatable to form one or more buffers of corresponding selected sizes; and a processor, coupled to the network interface and the segment buffer memory, and configured to determine whether the current data stream is a first data stream associated with the unique identifier, and allocate a segment of the segment buffer memory to contain a first amount of data from the current data stream, wherein a size of the segment is determined using an identification of a type of first data object received in the current data stream, and the size of the segment is further determined using a size of a last segment of a previous data stream associated with the unique identifier, if the current data stream is not the first data stream associated with the unique identifier. - View Dependent Claims (18, 19, 20)
-
Specification