Method and apparatus for efficiently merging, storing and retrieving incremental data
First Claim
1. A method comprising:
- receiving a request for a plurality of pages in a set of incrementally stored data as of a desired time;
wherein the set of incrementally stored data comprises a base archive and a plurality of snapshots;
wherein each snapshot corresponds to a distinct epoch of a plurality of epochs, each epoch comprising a time period after creation of the base archive;
wherein each snapshot comprises;
a single revised version of each page of data that was changed in the distinct epoch; and
an index that indicates, for each page that was changed in the distinct epoch, an identity of the page and a location of the single revised version of the page; and
going from most-recent epoch to least-recent epoch, searching the index of each snapshot corresponding to an epoch later than creation of the base archive and no later than the desired time to identify a location of a most recent version of each page of data that was changed during the epoch;
storing, in a hash table, each location of each most recent version identified in the searching step, wherein the hash table utilizes a hash function to assign each page number to a unique value, wherein the unique value determines data associated with each said page number;
searching the hash table for an entry corresponding to a desired page;
if an entry corresponding to the desired page is found, retrieving the data from the snapshot corresponding to the epoch identified by the entry;
if an entry is not found in the hash table, retrieving the data from the base archive; and
wherein the desired time is later than the creation of the base archive and wherein data in each revised version of each page in a snapshot is later than the data in each corresponding page of the base archive.
21 Assignments
0 Petitions
Accused Products
Abstract
In a method and apparatus for retrieving data from a snapshot data storage system, for each epoch, a snapshot including (i) all changed data, and (ii) an index is created. The index includes an entry for each page that has changed during the epoch. For rapidly retrieving the data as of any given time, the method creates a hash table that includes an entry for each data page that has changed since the baseline was created. The hash table entry indicates the epoch in which the data most recently changed and an offset corresponding to the location of the changed data in the corresponding snapshot. The hash table is created by inserting an entry for each page in the most recent index, and then examining the remaining indices for all other snapshots from the most recent to the oldest snapshot and adding any non-duplicate entries into the table.
58 Citations
14 Claims
-
1. A method comprising:
-
receiving a request for a plurality of pages in a set of incrementally stored data as of a desired time; wherein the set of incrementally stored data comprises a base archive and a plurality of snapshots; wherein each snapshot corresponds to a distinct epoch of a plurality of epochs, each epoch comprising a time period after creation of the base archive; wherein each snapshot comprises; a single revised version of each page of data that was changed in the distinct epoch; and an index that indicates, for each page that was changed in the distinct epoch, an identity of the page and a location of the single revised version of the page; and going from most-recent epoch to least-recent epoch, searching the index of each snapshot corresponding to an epoch later than creation of the base archive and no later than the desired time to identify a location of a most recent version of each page of data that was changed during the epoch; storing, in a hash table, each location of each most recent version identified in the searching step, wherein the hash table utilizes a hash function to assign each page number to a unique value, wherein the unique value determines data associated with each said page number; searching the hash table for an entry corresponding to a desired page; if an entry corresponding to the desired page is found, retrieving the data from the snapshot corresponding to the epoch identified by the entry; if an entry is not found in the hash table, retrieving the data from the base archive; and wherein the desired time is later than the creation of the base archive and wherein data in each revised version of each page in a snapshot is later than the data in each corresponding page of the base archive. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A device comprising:
-
a processor; a memory connected to the processor; and an interface connected to the processor, the interface being operable to; communicate on a storage device on which a set of incrementally stored data is stored; wherein the set of incrementally stored data comprises a base archive and a plurality of snapshots; wherein each snapshot corresponds to a distinct epoch of a plurality of epochs, each epoch comprising a time period after creation of the base archive; wherein each snapshot comprises; a single revised version of each page of data that was changed in the distinct epoch; and an index that indicates, for each page that was changed in the distinct epoch, an identity of the page and a location of the single revised version of the page; and wherein the processor is configured to perform the steps of; receiving a request for a plurality of pages in the set of incrementally stored data as of a desired time; going from most-recent epoch to least-recent epoch, searching the index of each snapshot corresponding to an epoch later than creation of the base archive and no later than the desired time to identify a location of a most recent version of each page of data that was changed during the epoch; storing, in a hash table, each location of each most recent version identified in the searching step in a memory, wherein the hash table utilizes a hash function to assign each page number to a unique value, wherein the unique value determines data associated with each said page number; searching the hash table for an entry corresponding to a desired page; if an entry corresponding to the desired page is found, retrieving the data from the snapshot corresponding to the epoch identified by the entry; if an entry is not found in the hash table, retrieving the data from the base archive; and wherein the desired time is later than the creation of the base archive and wherein data in each revised version of each page in a snapshot is later than the data in each corresponding page of the base archive. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
Specification