System and method for sharing storage resources between multiple files
First Claim
1. A processor-based method for creating one or more chunks in a machine readable medium from a sequence of values in the machine readable medium, each chunk representing a portion of the sequence of values between at least two breakpoints, the method comprising:
- obtaining a fingerprint value for a position in the sequence of values in the machine readable medium;
designating positions within the sequence of values as breakpoints where each breakpoint is the beginning or end of a chunk in the machine readable medium and a position is designated as a breakpoint if at least one of the following breakpoint conditions is true;
(a) the fingerprint value at the position is a D1-match(b) the fingerprint value at the position is a D2-match, and the distance between the prior breakpoint, if any, and any subsequent D1-match breakpoint is greater than a pre-determined maximum value M1;
(c) the distance between the prior breakpoint, if any, and the position is equal to the maximum value M1, whereina value is a D1-match if a function D1Match returns ‘
true’
when applied to the fingerprint value;
a value is a D2-match if a function D2Match returns ‘
true’
when applied to the fingerprint value;
D1Match and D2Match are Boolean functions mapping integers to Boolean values; and
when applied to a random integer, a probability that D1Match returns ‘
true’
is smaller than a probability that D2Match returns ‘
true’
.
2 Assignments
0 Petitions
Accused Products
Abstract
An improved sliding window chunking apparatus and method comprising comparing a fingerprint value of each position in a data set to a second set of criteria, at least in instances when it doesn'"'"'t satisfy a first set of criteria, and, if the value satisfies the second set of criteria, identifying the position as a potential breakpoint. Subsequently, if a fingerprint value that satisfies the first set of criteria is not found before a maximum chunk size is reached, the potential breakpoint can be designated as a breakpoint. Further improvement is possible by imposing minimum and maximum sizes on chunks. In some instances, more than two sets of criteria may be used to identify additional potential chunks to be used should subsets having fingerprint values satisfying either of the first two sets of criteria not be found.
-
Citations
18 Claims
-
1. A processor-based method for creating one or more chunks in a machine readable medium from a sequence of values in the machine readable medium, each chunk representing a portion of the sequence of values between at least two breakpoints, the method comprising:
-
obtaining a fingerprint value for a position in the sequence of values in the machine readable medium; designating positions within the sequence of values as breakpoints where each breakpoint is the beginning or end of a chunk in the machine readable medium and a position is designated as a breakpoint if at least one of the following breakpoint conditions is true; (a) the fingerprint value at the position is a D1-match (b) the fingerprint value at the position is a D2-match, and the distance between the prior breakpoint, if any, and any subsequent D1-match breakpoint is greater than a pre-determined maximum value M1; (c) the distance between the prior breakpoint, if any, and the position is equal to the maximum value M1, wherein a value is a D1-match if a function D1Match returns ‘
true’
when applied to the fingerprint value;a value is a D2-match if a function D2Match returns ‘
true’
when applied to the fingerprint value;D1Match and D2Match are Boolean functions mapping integers to Boolean values; and when applied to a random integer, a probability that D1Match returns ‘
true’
is smaller than a probability that D2Match returns ‘
true’
. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. An apparatus comprising;
-
a machine readable medium containing instructions which, when executed by a machine, causes the machine to perform operations for creating a plurality of chunks in a machine readable medium from a sequence of values in the machine readable medium where each chunk is a smaller sequence of values between at least two breakpoints, the operations comprising; obtaining a fingerprint value for a position in the sequence of values in the machine readable medium; grouping the sequence into a plurality of adjacent, non-overlapping chunks in the machine readable medium wherein each chunk begins or ends on a breakpoint, the breakpoint satisfying at least one of the following conditions; (a) the fingerprint value at the position is a D2-match, and the distance between the prior breakpoint, if any, and any subsequent D1-match breakpoint is greater than a pre-determined maximum value M1; (b) the distance between the prior breakpoint if any, and the position is equal to the maximum value M1, wherein a value is a D1-match if a function D1Match returns ‘
true’
when applied to the fingerprint value;
p1 a value is a D2-match if a function D2Match returns ‘
true’
when applied to the fingerprint value;D1Match and D2Match are pre-determined Boolean functions mapping integers to Boolean values; and when applied to a random integer, the probability that D1 Match returns ‘
true’
is smaller than the probability that D2Match returns ‘
true’
. - View Dependent Claims (13, 14, 15)
-
-
16. An apparatus comprising:
-
a machine readable medium for storing a fingerprint value representing a position in a sequence of values, the a sequence of values organized into a plurality of chunks in the machine readable medium wherein each chunk begins or ends on a position in the machine readable medium that satisfies at least one of the following breakpoint conditions; the fingerprint value at the position is a D1-match; the fingerprint value at the position is a D2-match, and the distance between a prior breakpoint, if any, and any subsequent D1-match breakpoint is greater than a pre-determined maximum value M1; and the distance between the prior breakpoint, if any, and the position is equal to the maximum value M1;
whereina value is a D1-match if the value divided by a first divisor D1 results in a remainder having a value R1; a value is a D2-match if the value divided by a second divisor D2 results in a remainder having a value R2; D1 is not equal to D2; and D1, D2, R1, and R2 have specific values. - View Dependent Claims (17, 18)
-
Specification