DATA REDUCTION INDEXING
First Claim
1. An apparatus, comprising:
- a processor;
a memory; and
an interface that connects the processor, the memory, and a set of logics, the set of logics comprising;
a global index logic configured to store a set of location contexts in a global index and to search the global index for a hash to resolve a data reduction duplicate determination, where a location context comprises at least a fingerprint hash and a chunk location information; and
a temporal index logic configured to store an ordered set of optimizations in a temporal index and to search the temporal index for the hash to resolve the data reduction duplicate determination without accessing the global index.
10 Assignments
0 Petitions
Accused Products
Abstract
Example apparatus, methods, data structures, and computers control indexing to facilitate duplicate determinations. One example method includes indexing, in a global index, a unique chunk processed by a data de-duplicator. Indexing the unique chunk in the global index can include updating an expedited data structure associated with the global index. The example method can also include selectively indexing, in a temporal index, a relationship chunk processed by the data de-duplicator. The relationship chunk is a chunk that is related to another chunk processed by the data de-duplicator by sequence, storage location, and/or similarity hash value. Indexing the relationship chunk in the temporal index can also include updating one or more expedited data structures associated with the temporal index. The expedited data structures and indexes can then be consulted to resolve a duplicate determination being made by a data reducer.
-
Citations
18 Claims
-
1. An apparatus, comprising:
-
a processor; a memory; and an interface that connects the processor, the memory, and a set of logics, the set of logics comprising; a global index logic configured to store a set of location contexts in a global index and to search the global index for a hash to resolve a data reduction duplicate determination, where a location context comprises at least a fingerprint hash and a chunk location information; and a temporal index logic configured to store an ordered set of optimizations in a temporal index and to search the temporal index for the hash to resolve the data reduction duplicate determination without accessing the global index. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. An article of manufacture, comprising:
-
a computer readable medium storing computer executable instructions that when executed by a computer control the computer to perform a method, the method comprising; indexing, in a global index, a unique chunk processed by a data de-duplicator; and selectively indexing, in a temporal index, a relationship chunk processed by the data de-duplicator, where the relationship chunk is a chunk that is related to another chunk processed by the data de-duplicator by one or more of, sequence, storage location, and similarity hash value; where indexing the unique chunk comprises; storing a location context in a selected partition in the global index, the location context comprising a hash value computed from the unique chunk and information identifying where the unique chunk is stored in a storage system accessible to the de-duplicator; and updating an expedited data structure associated with the selected partition in the global index to include the hash value computed from the unique chunk; and where selectively indexing the relationship chunk comprises; storing a hash value computed from the relationship chunk in one or more of, a partitioned sequence index, a partitioned location index, and a partitioned relation index; and updating one or more expedited data structures associated with the one or more partitions in the one or more indexes into which the hash value is stored to include the hash value computed from the relationship chunk. - View Dependent Claims (17)
-
-
18. A memory for storing data for access by an application being executed on a data processing system, comprising:
-
a data structure stored in the memory, the data structure storing information used by the application and including; a global index storing a complete set of location contexts associated with chunks processed by a data de-duplicator; and a temporal index storing an incomplete set of location contexts associated with chunks processed by the data de-duplicator; where the global index is a hash tree whose primary key is a hash field, where the global index is partitioned, and where the global index is protected by at least one expedited data filter; and where the temporal index comprises; a partitioned sequence index arranged as a hash tree and protected by an expedited data filter; a partitioned location index arranged as a hash tree and protected by an expedited data filter; and a partitioned relation index arranged as a hash tree and protected by an expedited data filter, and where the temporal index stores an ordered set of optimizations configured to facilitate resolving a duplicate determination for the data de-duplicator without accessing the global index.
-
Specification