Method and system for data reduction
First Claim
1. A method of data management, comprising:
- (a) generating a first data string by comparing contents of a first modified data file against contents of an original data file and characterizing an output of such comparison using a given syntax that defines given data insertions, deletions and replacements;
(b) generating a second data string by comparing the first data string against the contents of the original data file and characterizing an output of such comparison using the given syntax; and
(c) generating a byte range index, the byte range index for use in reconstructing the contents of the first modified data file by referencing the contents of the original data file and the second data string.
24 Assignments
0 Petitions
Accused Products
Abstract
A “forward” delta data management technique uses a “sparse” index associated with a delta file to achieve both delta management efficiency and to eliminate read latency while accessing history data. The invention may be implemented advantageously in a data management system that provides real-time data services to data sources associated with a set of application host servers. To facilitate a given data service, a host driver embedded in an application server connects an application and its data to a cluster. The host driver captures real-time data transactions, preferably in the form of an event journal that is provided to the data management system. In particular, the driver functions to translate traditional file/database/block I/O into a continuous, application-aware, output data stream. In an illustrative embodiment, a given application aware data stream is processed through a multi-stage data reduction process to produce a compact data representation from which an “any point-in-time” reconstruction of the original data can be made.
-
Citations
22 Claims
-
1. A method of data management, comprising:
-
(a) generating a first data string by comparing contents of a first modified data file against contents of an original data file and characterizing an output of such comparison using a given syntax that defines given data insertions, deletions and replacements;
(b) generating a second data string by comparing the first data string against the contents of the original data file and characterizing an output of such comparison using the given syntax; and
(c) generating a byte range index, the byte range index for use in reconstructing the contents of the first modified data file by referencing the contents of the original data file and the second data string. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A data store in a distributed data management system, comprising:
-
a first file storing baseline data;
a second file storing data generated at least in part by applying a given differencing function to a given version of the baseline data; and
a set of one or more metadata objects, each metadata object being associated with a given version of the baseline data and including an index that references the first file and, optionally, the second file. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A method of managing baseline data in a distributed management system, comprising:
-
(a) storing a given data object, the given data object comprising;
(i) a first file storing baseline data;
(ii) a second file storing data generated at least in part by applying a given differencing function to a given version of the baseline data; and
(iii) a set of one or more metadata objects, each metadata object being associated with a given version of the baseline data and including an index that references the first file and, optionally, the second file; and
(b) in response to a given request, using the index associated with the given version to reconstruct the given version. - View Dependent Claims (16, 17, 18, 19)
-
-
20. A method of data management, comprising:
-
(a) generating a data string by comparing contents of a first modified data file against contents of an original data file and characterizing an output of such comparison using a given syntax that defines given data insertions, deletions and replacements; and
(b) generating a byte range index, the byte range index for use in reconstructing the contents of the first modified data file by referencing the contents of the original data file and the data string. - View Dependent Claims (21, 22)
-
Specification