Multi-streamed method for optimizing data transfer through parallelized interlacing of data based upon sorted characteristics to minimize latencies inherent in the system
First Claim
Patent Images
1. A multi-streamed method for parallelized interlacing of data for backup comprises:
- receiving and storing a list of most frequently encountered type 2 pieces each of which comprise a data hash determined from a type 3 piece which comprise a file data shard having variable length up to a maximum size;
receiving and storing a list of object identities organized by file size wherein objects comprise files;
operating on a plurality of files simultaneously using a plurality of parallel threads controlling one or more processors bydetermining a single type 1 piece for each file which type 1 piece comprises file name, size, and date,determining at least one type 3 piece for each file anddetermining a single type 2 piece for each type 3 piece;
within a first parallel thread, while there is any object identity remaining on said list of object identities organized by file size,removing a top file identity from a top of the list,reading a top file from non-volatile mass store, andoperating on said top file to determine type 1, type 2, and type 3 pieces;
within a second parallel thread, while there is any object identity remaining on said list of object identities organized by file size,removing a bottom file identity from a bottom of the list,reading a bottom file from non-volatile mass store, andoperating on said bottom file to determine type 1, type 2, and type 3 pieces;
until piece store is full, in the following order, for each file operated on,writing type 1 piece to a piece store, then, subsequentlywriting one or more type 2 pieces to the piece store, and finally,writing one or more type 3 pieces to the piece store;
comparing each type 2 piece stored in the piece store with—
the list of most frequently encountered type 2 pieces, andremoving from the piece store any type 3 piece which corresponds to a most frequently encountered type 2 piece.
11 Assignments
0 Petitions
Accused Products
Abstract
A method which operates a plurality of threads in parallel on disparate file sizes ordered by an additional thread. Efficiently backing up of heterogeneous non-volatile mass store to a network attached server scalably distributes computing hashes and eliminating duplication. The method segments each file and object into a hierarchy of pieces in a plurality of types and avoids sending unnecessary pieces.
21 Citations
7 Claims
-
1. A multi-streamed method for parallelized interlacing of data for backup comprises:
-
receiving and storing a list of most frequently encountered type 2 pieces each of which comprise a data hash determined from a type 3 piece which comprise a file data shard having variable length up to a maximum size; receiving and storing a list of object identities organized by file size wherein objects comprise files; operating on a plurality of files simultaneously using a plurality of parallel threads controlling one or more processors by determining a single type 1 piece for each file which type 1 piece comprises file name, size, and date, determining at least one type 3 piece for each file and determining a single type 2 piece for each type 3 piece; within a first parallel thread, while there is any object identity remaining on said list of object identities organized by file size, removing a top file identity from a top of the list, reading a top file from non-volatile mass store, and operating on said top file to determine type 1, type 2, and type 3 pieces; within a second parallel thread, while there is any object identity remaining on said list of object identities organized by file size, removing a bottom file identity from a bottom of the list, reading a bottom file from non-volatile mass store, and operating on said bottom file to determine type 1, type 2, and type 3 pieces; until piece store is full, in the following order, for each file operated on, writing type 1 piece to a piece store, then, subsequently writing one or more type 2 pieces to the piece store, and finally, writing one or more type 3 pieces to the piece store; comparing each type 2 piece stored in the piece store with—
the list of most frequently encountered type 2 pieces, andremoving from the piece store any type 3 piece which corresponds to a most frequently encountered type 2 piece. - View Dependent Claims (2, 3)
-
-
4. A method for operating each one of a plurality of heterogeneous user stations comprising:
-
within a sorted file list circuit, receiving an object request from a local area network attached apparatus, selecting files related to the requested object, sorting the selected files on size, merging all identifiers of the sorted selected files into a sorted file list, within each of a plurality of thread circuits, extracting a plurality of files at certain positions in the sorted file list, wherein certain positions in the sorted file list comprises; beginning of the sorted file list, end of the sorted file list, and a midpoint of the sorted file list, for each extracted file, determining a begin file piece, determining a first file data piece, determining a first file data hash piece, reiterating determination of file data and file data hash pieces; within a piece store insertion circuit, until each piece store is full, receiving pieces from one of a plurality of thread circuits, until each piece store is full, writing into the piece store the following pieces if available in the following order, firstly, begin file piece, secondly, file data hash piece, thirdly, file data piece, wherein a file data piece comprises a data shard of variable length and maximum size, and a file data hash piece comprises a data hash of fixed length corresponding to a specific file data piece, and a begin file piece comprises file name, size, and date, receiving and storing a list of file data hash pieces determined from file data pieces previously stored at the local area network attached apparatus, removing from the piece store any file data piece which was previously stored at the local area network attached apparatus. - View Dependent Claims (5, 6)
-
-
7. A method for scheduling file access to a non-volatile mass store by a plurality of parallel threads controlling a processor to convert each stored file into a hierarchy of pieces for efficient backup without duplication comprising:
-
determining a list of files for processing, sorting the list of files by an orderable characteristic, assigning a certain position in the list of files to a first thread controlling a processor, assigning an other certain position in the list of files to a second thread controlling a processor, assigning at least one intermediate point of the list of files to one or more additional threads, wherein certain positions comprises a first position at a top and a second position at a bottom of the list and a third thread is assigned to receive files taken at a point half way between a bottom of the list and a midpoint of the list; wherein the orderable characteristic is file size and wherein each thread controls a processor to perform; converting each file into a hierarchy of pieces and a plurality of piece types; writing a single begin file piece into piece store for each file the begin file piece comprising name, size, and date; writing at least one file data piece of variable length but maximum size containing a data shard, into piece store; and writing a single file data hash piece into piece store, the filed data hash piece comprising a data hash for each file data piece.
-
Specification