Producing chunks from input data using a plurality of processing elements
First Claim
Patent Images
1. A method executed by a computer including a plurality of processing elements, comprising:
- dividing input data into a plurality of segments;
processing the plurality of segments, in parallel, by the processing elements in the computer, wherein processing the plurality of segments produces a plurality of tentative sets of chunks defined by respective chunk boundaries, wherein at least one of the plurality of tentative sets includes a chunk defined by a particular chunk boundary that would not have been identified as a chunk boundary if the input data were sequentially processed by a single one of the processing elements; and
stitching the plurality of tentative sets of chunks together to produce an output set of chunks, wherein the stitching comprises;
creating a combined set of chunks by combining the plurality of tentative sets of chunks; and
merging two adjacent or overlapping chunks from different tentative sets of chunks into a single chunk if a size of a resulting chunk from the merging would not exceed a predetermined maximum chunk size.
2 Assignments
0 Petitions
Accused Products
Abstract
Input data is divided into multiple segments that are processed by processing elements of a computer. The processing of the segments produces a plurality of tentative sets of chunks. The plurality of tentative sets of chunks are stitched together to produce an output set of chunks.
55 Citations
19 Claims
-
1. A method executed by a computer including a plurality of processing elements, comprising:
-
dividing input data into a plurality of segments; processing the plurality of segments, in parallel, by the processing elements in the computer, wherein processing the plurality of segments produces a plurality of tentative sets of chunks defined by respective chunk boundaries, wherein at least one of the plurality of tentative sets includes a chunk defined by a particular chunk boundary that would not have been identified as a chunk boundary if the input data were sequentially processed by a single one of the processing elements; and stitching the plurality of tentative sets of chunks together to produce an output set of chunks, wherein the stitching comprises; creating a combined set of chunks by combining the plurality of tentative sets of chunks; and merging two adjacent or overlapping chunks from different tentative sets of chunks into a single chunk if a size of a resulting chunk from the merging would not exceed a predetermined maximum chunk size. - View Dependent Claims (2, 3, 10)
-
-
4. A method executed by a computer including a plurality of processing elements, comprising:
-
dividing input data into a plurality of segments; processing the plurality of segments, in parallel, by the processing elements in the computer, wherein processing the plurality of segments produces a plurality of tentative sets of chunks; and stitching the plurality of tentative sets of chunks together to produce an output set of chunks, wherein the stitching comprises; extending a first of the tentative sets of chunks until the first tentative set of chunks reaches a synchronization point with respect to a second of the tentative sets, wherein the extended first tentative set of chunks is based on at least first and second ones of the plurality of segments, wherein the second tentative set is based on at least the second segment, and wherein stitching the plurality of tentative sets of chunks further comprises removing inconsistencies in chunks identified by the extended first tentative set and the second tentative set. - View Dependent Claims (5, 6, 7, 8, 9, 11, 12)
-
-
13. A computer comprising:
-
a storage device to store input data; and a plurality of processing elements to; process segments of the input data to identify tentative chunk boundaries, wherein the processing produces at least a first tentative set of chunks defined by a first group of the tentative chunk boundaries that is based on a first one of the segments, and a second tentative set of chunks defined by a second group of the tentative chunk boundaries that is based on a second one of the segments; extend the first tentative set until synchronization occurs, wherein the synchronization comprises identifying a point that is a chunk boundary of a chunk contained in the extended first tentative set and of a chunk contained in the second tentative set, wherein the extended first tentative set of chunks is based on at least the first and second segments; and select a set of the chunking boundaries identified by the extended first tentative set or the second tentative set to provide as part of an output set of chunking boundaries, wherein the selecting comprises removing inconsistencies in chunks identified by the extended first tentative set and the second tentative set. - View Dependent Claims (14, 15)
-
-
16. An article comprising at least one non-transitory computer-readable storage medium storing instructions that when executed cause a computer to:
-
process segments, by a plurality of processing elements in parallel, of input data to identify chunk boundaries defining respective chunks, wherein the processing produces at least a first tentative set of the chunks that is based on at least first and second ones of the segments, and a second tentative set of the chunks that is based on at least the second segment, and wherein the processing comprises extending the first tentative set until the first tentative set reaches a synchronization point with respect to the second tentative set; and stitch at least the first and second tentative sets of chunks together to produce an output set of chunks, wherein the stitching comprises performing harmonization to resolve inconsistencies in the chunk boundaries identified by the first and second tentative sets. - View Dependent Claims (17, 18, 19)
-
Specification