METHOD FOR INCREASING DEDUPLICATION SPEED ON DATA STREAMS FRAGMENTED BY SHUFFLING
First Claim
1. A computer-implemented method for deduplicating an incoming data sequence, the method comprising the steps of:
- storing signature values for a plurality of data blocklets of a parent data sequence in a deduplication index;
sequentially storing signature values for at least some of the plurality of data blocklets of the parent data sequence in a first storage location outside of the deduplication index;
determining that a first data blocklet in the incoming data sequence is absent from the parent data sequence;
storing a signature value for the first data blocklet in a second storage location outside of the deduplication index;
determining that a second data blocklet that follows the first data blocklet in the incoming data sequence is present in the parent data sequence, the second data blocklet having a signature value that is stored in the first storage location; and
copying at least a portion of the contents of the second storage location into a cache to expedite access during deduplication of the incoming data sequence.
10 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method for deduplicating an incoming data sequence can include the steps of storing signature values for a plurality of data blocklets of a parent data sequence in a deduplication index, sequentially storing signature values for at least some of the plurality of data blocklets of the parent data sequence in a first storage location outside of the deduplication index, determining that a first data blocklet in the incoming data sequence is absent from the parent data sequence, storing a signature value for the first data blocklet in a second storage location outside of the deduplication index, storing a guarded link linking the first data blocklet to the second data blocklet into the second storage location, determining that a second data blocklet that follows the first data blocklet in the incoming data sequence is present in the parent data sequence, the second data blocklet having a signature value that is stored in the first storage location, and copying at least a portion of the contents of the first storage location and the second storage location into a cache to expedite access during deduplication of the incoming data sequence.
-
Citations
26 Claims
-
1. A computer-implemented method for deduplicating an incoming data sequence, the method comprising the steps of:
-
storing signature values for a plurality of data blocklets of a parent data sequence in a deduplication index; sequentially storing signature values for at least some of the plurality of data blocklets of the parent data sequence in a first storage location outside of the deduplication index; determining that a first data blocklet in the incoming data sequence is absent from the parent data sequence; storing a signature value for the first data blocklet in a second storage location outside of the deduplication index; determining that a second data blocklet that follows the first data blocklet in the incoming data sequence is present in the parent data sequence, the second data blocklet having a signature value that is stored in the first storage location; and copying at least a portion of the contents of the second storage location into a cache to expedite access during deduplication of the incoming data sequence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computer-implemented method for deduplicating an incoming data sequence, the method comprising the steps of:
-
storing signature values for a plurality of data blocklets of a parent data sequence in a deduplication index; sequentially storing signature values for at least some of the plurality of data blocklets of the parent data sequence in a first storage location outside of the deduplication index; locating a transition data blocklet that is absent from the parent data sequence; sequentially storing a signature value for the transition data blocklet into a second storage location outside the deduplication index; determining that a signature value for a data blocklet that follows the transition data blocklet is included in the storage location; and copying at least a portion of the contents of the second storage location into a cache to expedite access during deduplication of the incoming data sequence. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A computer-implemented method for deduplicating an incoming data sequence, the method comprising the steps of:
-
storing signature values for a plurality of data blocklets of a parent data sequence in a deduplication index; sequentially storing signature values for at least some of the plurality of data blocklets of the parent data sequence in a first cluster header outside of the deduplication index; locating a transition data blocklet that is absent from the parent data sequence; determining that a data blocklet that immediately precedes the transition data blocklet in the incoming data sequence is present in the parent data sequence; determining that a signature value for a data blocklet that immediately follows the transition data blocklet is included in the first cluster header; storing a signature value for the transition data blocklet into a second cluster header that is designated to receive only transition data blocklets that are absent from the parent data sequence; storing a link in the first storage location, the link linking one of the data blocklets of the parent data sequence to the transition data blocklet; storing a guarded link in the second cluster header, the guarded link linking the transition data blocklet to the data blocklet that follows the transition data blocklet; and copying the contents of the second cluster header into a cache that is embedded in a computer usable volatile memory to expedite access during deduplication of the incoming data sequence.
-
Specification