Landmark chunking of landmarkless regions
First Claim
Patent Images
1. A computer-executed method for forming data chunks from a sequence of data values comprising:
- determining, by a computer, whether processing of the sequence of data values has entered a region that is landmark-free, wherein the landmark-free region is devoid of any landmarks that provide boundaries of the data chunks;
producing, by the computer, a data chunk using a specialized landmark chunking technique that is specialized for landmark-free regions in response to determining that the processing of the sequence of data values has entered a landmark-free region; and
producing, by the computer, a data chunk using a first standard-data landmark chunking technique in response to determining that the processing of the sequence of data values has not entered a landmark-free region.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer-executed method for forming data chunks from a sequence of data values comprises determining whether processing of the sequence of data values has entered a landmark-free region. If processing has entered a landmark-free region, a data chunk is produced using a specialized landmark chunking technique that is specialized for landmark-free regions. Otherwise, the method comprises producing a data chunk using a standard-data landmark chunking technique.
66 Citations
19 Claims
-
1. A computer-executed method for forming data chunks from a sequence of data values comprising:
-
determining, by a computer, whether processing of the sequence of data values has entered a region that is landmark-free, wherein the landmark-free region is devoid of any landmarks that provide boundaries of the data chunks; producing, by the computer, a data chunk using a specialized landmark chunking technique that is specialized for landmark-free regions in response to determining that the processing of the sequence of data values has entered a landmark-free region; and producing, by the computer, a data chunk using a first standard-data landmark chunking technique in response to determining that the processing of the sequence of data values has not entered a landmark-free region. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-executed method for forming data chunks from a sequence of data values comprising:
-
determining, by a computer, whether processing of the sequence of data values has entered a region that is landmark-free; producing, by the computer, a data chunk using a specialized landmark chunking technique that is specialized for landmark-free regions if determined that the processing of the sequence of data values has entered a landmark-free region; and producing, by the computer, a data chunk using a first standard-data landmark chunking technique, wherein producing a data chunk using the specialized landmark chunking technique that is specialized for landmark-free regions comprises using a technique selected from a group consisting of; a first technique comprising; producing a selected number of consecutive chunks as maximum-length chunks without inspecting underlying data in the sequence of data values; producing a first chunk following the maximum-length chunks using a second standard-data landmark chunking technique; determining whether the first chunk has a length equal to a predetermined maximum length; if the first chunk length is equal to the predetermined maximum length, looping to producing the selected number of consecutive chunks as maximum-length chunks without inspecting the underlying data in the sequence of data values; a second technique comprising; producing a selected number of consecutive chunks as maximum-length chunks without inspecting underlying data in the sequence of data values; and a third technique comprising; producing one chunk as a maximum-length chunk without inspecting the underlying data in the sequence of data values; checking data of a predetermined maximum length immediately following the produced one chunk for characteristics of landmark-free regions; if the checked data has characteristics of landmark-free regions, looping to producing one maximum-length chunk.
-
-
7. A computer-executed method for forming data chunks from a sequence of data values comprising:
-
determining, by a computer, whether processing of the sequence of data values has entered a region that is landmark-free; producing, by the computer, a data chunk using a specialized landmark chunking technique that is specialized for landmark-free regions if determined that the processing of the sequence of data values has entered a landmark-free region; and producing, by the computer, a data chunk using a first standard-data landmark chunking technique, wherein producing a data chunk using the specialized landmark chunking technique that is specialized for landmark-free regions comprises; computing fingerprint values for positions in the sequence of data values; computing a first fingerprint value for a first window of bytes in the sequence of data values; determining whether a second window of bytes is same as the first window of bytes; and assigning the first fingerprint value to the second window of bytes without fingerprint computation on the bytes in the second window if the second window of bytes is the same as the first window of bytes, otherwise computing a second fingerprint value for the second window of bytes.
-
-
8. A data processing apparatus comprising:
-
a computer; and a logic executable in the computer to; form data chunks from a sequence of data values including; determining whether processing of the sequence of data values has entered a landmark-free region, wherein the landmark-free region is devoid of any landmarks that provide boundaries of the data chunks, in response to the logic determining that the processing has entered a landmark-free region, produce a data chunk using a specialized landmark chunking technique that is specialized for landmark-free regions, and in response to the logic determining that the processing has not entered a landmark-free region, produce a data chunk using a first standard-data landmark chunking technique. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A data processing apparatus comprising:
-
a computer; and a logic executable in the computer to; form data chunks from a sequence of data values including; determining whether processing of the sequence of data values has entered a landmark-free region, in response to the logic determining that the processing has entered a landmark-free region, produce a data chunk using a specialized landmark chunking technique that is specialized for landmark-free regions, and in response to the logic determining that the processing has not entered a landmark-free region, produce a data chunk using a first standard-data landmark chunking technique, wherein the specialized landmark chunking technique; computes fingerprint values for positions in the sequence of data values including a first fingerprint value for a first window of bytes in the sequence of data values, determines whether a second window of bytes is same as the first window of bytes, and assigns the first fingerprint value to the second window of bytes without fingerprint computation on the bytes in the second window if the second window of bytes is the same as the first window of bytes, otherwise computing a second fingerprint value for the second window of bytes.
-
-
18. An article of manufacture comprising:
a non-transitory computer-usable medium storing a computer readable program code for forming data chunks from a sequence of data values, the computer readable program code executable by a computer to cause the computer to; determine whether processing of the sequence of data values has entered a landmark-free region, wherein the landmark-free region is devoid of any landmarks that provide boundaries of the data chunks; produce a data chunk using a specialized landmark chunking technique that is specialized for landmark-free regions in response to determining that the processing has entered the landmark-free region; and produce a data chunk using a standard-data landmark chunking technique in response to determining that the processing of the sequence of data values has not entered a landmark-free region. - View Dependent Claims (19)
Specification