Method and system for data reduction
First Claim
1. A method for capturing and storing a data history of a file to enable reconstruction of versions of the file, comprising:
- storing a copy of a first version of a file at a remote location;
comparing with one or more computer processors, a second version of the file to the first version of the file, or the copy of the first version of the file, to generate one or more delta strings associated with the second version of the file;
storing the one or more delta strings associated with the second version of the file at the remote location;
generating a byte range index at the remote location that refers to bytes in the copy of the first version of the file and to bytes in the one or more delta strings associated with the second version of the file;
wherein the byte range index references entire contents of the second version of the file;
storing the byte range index at the remote location; and
using the byte range index to enable reconstruction of the second version of the file without having to apply to the copy of the first version of the file the one or more delta strings associated with the second version of the file.
24 Assignments
0 Petitions
Accused Products
Abstract
A “forward” delta data management technique uses a “sparse” index associated with a delta file to achieve both delta management efficiency and to eliminate read latency while accessing history data. The invention may be implemented advantageously in a data management system that provides real-time data services to data sources associated with a set of application host servers. A host driver embedded in an application server connects an application and its data to a cluster. The host driver captures real-time data transactions, preferably in the form of an event journal that is provided to the data management system. In particular, the driver functions to translate traditional file/database/block I/O into a continuous, application-aware, output data stream. A given application-aware data stream is processed through a multi-stage data reduction process to produce a compact data representation from which an “any point-in-time” reconstruction of the original data can be made.
-
Citations
12 Claims
-
1. A method for capturing and storing a data history of a file to enable reconstruction of versions of the file, comprising:
-
storing a copy of a first version of a file at a remote location; comparing with one or more computer processors, a second version of the file to the first version of the file, or the copy of the first version of the file, to generate one or more delta strings associated with the second version of the file; storing the one or more delta strings associated with the second version of the file at the remote location; generating a byte range index at the remote location that refers to bytes in the copy of the first version of the file and to bytes in the one or more delta strings associated with the second version of the file; wherein the byte range index references entire contents of the second version of the file; storing the byte range index at the remote location; and using the byte range index to enable reconstruction of the second version of the file without having to apply to the copy of the first version of the file the one or more delta strings associated with the second version of the file. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
Specification