EFFICIENT FULL OR PARTIAL DUPLICATE FORK DETECTION AND ARCHIVING
First Claim
Patent Images
1. A method of reducing redundancy and increasing processing throughput of an archiving process, including the steps of:
- (a) detecting identical or substantially identical files and/or forks;
(b) compressing the first instance of such files and/or forks; and
(c) storing reference information relating to the first compressed copy and bypassing compression of the second and all subsequent occurrences of said identical files and/or forks.
1 Assignment
0 Petitions
Accused Products
Abstract
A method to efficiently detect, store, modify, and recreate fully or partially duplicate file forks is described. During archive creation or modification, sets of fully or partially duplicate forks are detected and a reduced number of transformed forks or fork segments are stored. During archive expansion, one or more forks are recreated from each full or partial copy.
19 Citations
30 Claims
-
1. A method of reducing redundancy and increasing processing throughput of an archiving process, including the steps of:
-
(a) detecting identical or substantially identical files and/or forks; (b) compressing the first instance of such files and/or forks; and (c) storing reference information relating to the first compressed copy and bypassing compression of the second and all subsequent occurrences of said identical files and/or forks. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
- 18. A method of detecting file and/or fork differences in which fork data is protected against the injection of duplicate or substantially duplicate forks, comprising the step of comparing fork segments with a cryptographically secure hashing algorithm.
-
27. A method of reducing redundancy and increasing processing throughput of a file archiving process, including the steps of:
-
(a) creating structural information that describes sets of unique and duplicate fork segments achieved in the archive creation process to reflect the final lists of fork segments, the structural information including overall pre- and post- transform fork sizes and/or locations of unique, transformed fork data in the archived data that describe subsets of fully duplicate forks and consists of identical size and location data for all forks in a subset, and which further describes subsets of partially duplicate forks and consists of sizes and/or locations for fork segments corresponding to difference points and lists of segments that, when concatenated in listed order, reconstitute original forks; and (b) updating the information created in step (a) as needed.
-
-
28. A method of reducing redundancy in archived digital files, comprising the step of hierarchically and/or encoding structural information with a sourcer coder and/or a statistical model and/or an entropy coder.
-
29. A method of reducing redundancy in archived data when sequential whole-archive expansion is a desired property of the archive data, comprising the step of positioning all fork structural information prior to the fork data it describes.
-
30. A method of handling duplicate forks during archive expansion, where the structural information for individual forks must be located and interpreted during expansion, said method including at least one of the following steps:
-
(a) when sequential archive consumption is desired during expansion, processing pre-inverse transform data consisting of forks and segments by inverse transform or transforms, and routing and/or concatenating post-inverse transform data, in the form of fully or partially duplicate forks to form one or more forks consisting of one or more fork segments, by writing post-inverse transform data in parallel to multiple files, or by writing to one file corresponding to a full fork or a collection of fork segments, and making copies of the file'"'"'s contents after its corresponding full fork or fork segments have been fully reconstructed by the inverse transform(s); (b) when sequential fork creation is desired and non-sequential archive consumption is also possible or permitted, reconstituting duplicate forks independently by processing pre-inverse transform data consisting of forks and segments by an inverse transform or transforms, wherein segments that form partially duplicate forks are concatenated after the inverse transform application; (c) when only sequential archive consumption is possible or permitted, and sequential fork creation is desired, processing pre-inverse transform data using an inverse transform or transforms and retaining post-inverse transform data with a buffer before routing and concatenating it into output forks; and (d) when differences between forks or fork segments were encoded by a differencing algorithm, using a patch transformation to produce a new fork or fork segment.
-
Specification