Vector processing for segmentation hash values calculation
First Claim
1. A system for block deduplication, comprising:
- a non-transitory memory comprising instructions;
a vector processor in communication with the memory, wherein the vector processor is configured to execute the instructions to create, from an input data stream, a segmented data stream comprising a plurality of variable size segments, wherein defining a segment of the plurality of variable size segments comprises;
applying a rolling sequence over a sequence of consecutive data items of the input data stream, the rolling sequence including a subset of consecutive data items of the sequence;
calculating a plurality of partial hash values, wherein the plurality of partial hash values are calculated concurrently by a plurality of vector processing pipelines of the vector processor, wherein each partial hash value corresponds to a respective partial rolling sequence out of a plurality of partial rolling sequences, and wherein each partial rolling sequence is comprised of a plurality of evenly spaced data items of the subset;
determining compliance of each of the plurality of partial hash values with at least one respective partial segmentation criterion; and
designating a cut in the sequence which defines the segment of the sequence in response to at least one of the plurality of partial hash values complying with the at least one respective partial segmentation criterion; and
a deduplication application configured to receive the segmented data stream and perform block deduplication on the segmented data stream;
wherein the at least one respective partial segmentation criterion defines a data pattern; and
wherein determining compliance of each of the plurality of partial hash values with at least one respective partial segmentation criterion comprises;
checking whether a portion of each partial hash value equals a predefined value;
orchecking whether each partial hash value is larger than a predefined value.
1 Assignment
0 Petitions
Accused Products
Abstract
A system for segmenting an input data stream using vector processing, comprising a processor adapted to repeat the following steps throughout an input data stream to create a segmented data stream consisting a plurality of segments: apply a rolling sequence over a sequence of consecutive data items of an input data stream, the rolling sequence includes a subset of consecutive data items of the sequence, calculate concurrently a plurality of partial hash values each by one of a plurality of processing pipelines of the processor, each for a respective one of a plurality of partial rolling sequences each including evenly spaced data items of the subset, determine compliance of each of the plurality of partial hash values with one or more respective partial segmentation criteria and designate the sequence as a variable size segment when at least some of the partial hash values comply with the respective partial segmentation criteria.
87 Citations
12 Claims
-
1. A system for block deduplication, comprising:
-
a non-transitory memory comprising instructions; a vector processor in communication with the memory, wherein the vector processor is configured to execute the instructions to create, from an input data stream, a segmented data stream comprising a plurality of variable size segments, wherein defining a segment of the plurality of variable size segments comprises; applying a rolling sequence over a sequence of consecutive data items of the input data stream, the rolling sequence including a subset of consecutive data items of the sequence; calculating a plurality of partial hash values, wherein the plurality of partial hash values are calculated concurrently by a plurality of vector processing pipelines of the vector processor, wherein each partial hash value corresponds to a respective partial rolling sequence out of a plurality of partial rolling sequences, and wherein each partial rolling sequence is comprised of a plurality of evenly spaced data items of the subset; determining compliance of each of the plurality of partial hash values with at least one respective partial segmentation criterion; and designating a cut in the sequence which defines the segment of the sequence in response to at least one of the plurality of partial hash values complying with the at least one respective partial segmentation criterion; and a deduplication application configured to receive the segmented data stream and perform block deduplication on the segmented data stream; wherein the at least one respective partial segmentation criterion defines a data pattern; and wherein determining compliance of each of the plurality of partial hash values with at least one respective partial segmentation criterion comprises; checking whether a portion of each partial hash value equals a predefined value;
orchecking whether each partial hash value is larger than a predefined value. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for block deduplication, the method comprising:
-
creating, from an input data stream, a segmented data stream comprising a plurality of variable size segments, wherein defining a segment of the plurality of variable size segments comprises; applying a rolling sequence over a sequence of consecutive data items of the input data stream, the rolling sequence includes a subset of consecutive data items of the sequence; calculating a plurality of partial hash values, wherein the plurality of partial hash values are calculated concurrently by a plurality of vector processing pipelines of the vector processor, wherein each partial hash value corresponds to a respective partial rolling sequence out of a plurality of partial rolling sequences, and wherein each partial rolling sequence is comprised of a plurality of evenly spaced data items of the subset; determining compliance of each of the plurality of partial hash values with at least one respective partial segmentation criterion; and designating a cut in the sequence which defines the segment of the sequence in response to at least one of the plurality of partial hash values complying with the at least one respective partial segmentation criterion; and performing block deduplication on the segmented data stream; wherein the at least one respective partial segmentation criterion defines a data pattern; and wherein determining compliance of each of the plurality of partial hash values with at least one respective partial segmentation criterion comprises; checking whether a portion of each partial hash value equals a predefined value;
orchecking whether each partial hash value is larger than a predefined value. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A non-transitory computer-readable medium having instructions stored thereon, wherein the instructions, when executed, facilitate:
-
creating, from an input data stream, a segmented data stream comprising a plurality of variable size segments, wherein defining a segment of the plurality of variable size segments comprises; applying a rolling sequence over a sequence of consecutive data items of the input data stream, the rolling sequence includes a subset of consecutive data items of the sequence; calculating a plurality of partial hash values, wherein the plurality of partial hash values are calculated concurrently by a plurality of vector processing pipelines of the vector processor, wherein each partial hash value corresponds to a respective partial rolling sequence out of a plurality of partial rolling sequences, and wherein each partial rolling sequence is comprised of a plurality of evenly spaced data items of the subset; determining compliance of each of the plurality of partial hash values with at least one respective partial segmentation criterion; and designating a cut in the sequence which defines the segment of the sequence in response to at least one of the plurality of partial hash values complying with the at least one respective partial segmentation criterion; and performing block deduplication on the segmented data stream; wherein the at least one respective partial segmentation criterion defines a data pattern; and wherein determining compliance of each of the plurality of partial hash values with at least one respective partial segmentation criterion comprises; checking whether a portion of each partial hash value equals a predefined value;
orchecking whether each partial hash value is larger than a predefined value.
-
Specification