System and method for multi-scale navigation of data
First Claim
1. A method of determining duplicate data for de-duplicating data in a computer system, the method comprising:
- reading a first predefined set of multiple summaries associated with a first region of data in a storage of the computer system, each member of the first predefined set of multiple summaries being a micro-fingerprint value characterizing a portion of data within the first region of data;
selecting a first member from the first predefined set of multiple summaries based on a value of the micro-fingerprint value of the first member;
generating, at least in part, a first macro-fingerprint associated with the first region of data by storing the first member within the first macro-fingerprint;
reading a second predefined set of multiple summaries associated with a set of data, each member of the second predefined set of multiple summaries being a micro-fingerprint value characterizing a portion of data within the set of data;
selecting a particular member from the second predefined set of multiple summaries based on a value of the micro-fingerprint value of the particular member;
generating, at least in part, a second macro-fingerprint associated with the set of data by storing the second member within the second macro-fingerprint; and
comparing the first macro-fingerprint associated with the first region with the second macro-fingerprint associated with the set of data to determine, at least in part, the duplicate data.
6 Assignments
0 Petitions
Accused Products
Abstract
A system configured to generate a macro-fingerprint from at least one predefined set of summaries is provided. The system includes data storage storing a first predefined set of summaries associated with a first region of data, each member of the first predefined set of summaries characterizing data within the first region of data; and at least one processor coupled to the data storage and configured to: read the first predefined set of summaries; select at least one first member from the first predefined set of summaries based on a value of the at least one first member; and store the at least one first member within a first macro-fingerprint. The first region of data may have a first size indicative of a quantity of data included in the first region of data. The macro fingerprints are created from previously created smaller (micro) fingerprints without having to reread the data.
-
Citations
20 Claims
-
1. A method of determining duplicate data for de-duplicating data in a computer system, the method comprising:
-
reading a first predefined set of multiple summaries associated with a first region of data in a storage of the computer system, each member of the first predefined set of multiple summaries being a micro-fingerprint value characterizing a portion of data within the first region of data; selecting a first member from the first predefined set of multiple summaries based on a value of the micro-fingerprint value of the first member; generating, at least in part, a first macro-fingerprint associated with the first region of data by storing the first member within the first macro-fingerprint; reading a second predefined set of multiple summaries associated with a set of data, each member of the second predefined set of multiple summaries being a micro-fingerprint value characterizing a portion of data within the set of data; selecting a particular member from the second predefined set of multiple summaries based on a value of the micro-fingerprint value of the particular member; generating, at least in part, a second macro-fingerprint associated with the set of data by storing the second member within the second macro-fingerprint; and comparing the first macro-fingerprint associated with the first region with the second macro-fingerprint associated with the set of data to determine, at least in part, the duplicate data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system configured to determine duplicate data for de-duplicating data in a computer system, the system comprising:
-
data storage storing a first predefined set of multiple summaries associated with a first region of data, each member of the first predefined set of multiple summaries being a micro-fingerprint value characterizing a portion of data within the first region of data; and at least one processor coupled to the data storage and programmed to; read the first predefined set of multiple summaries; select a first member from the first predefined set of multiple summaries based on a value of the micro-fingerprint value of the first member; generate, at least in part, a first macro-fingerprint associated with the first region of data by storing the first member within the first macro-fingerprint;
read a second predefined set of multiple summaries associated with a set of data, each member of the second predefined set of multiple summaries being a micro-fingerprint value characterizing a portion of data within the set of data;select a particular member from the second predefined set of multiple summaries based on a value of the micro-fingerprint value of the particular member; generate, at least in part, a second macro-fingerprint associated with the set of data by storing the second member within the second macro-fingerprint; and compare the first macro-fingerprint associated with the first region with the second macro-fingerprint associated with the set of data to determine, at least in part, the duplicate data. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A non-transitory computer readable medium storing computer readable instructions that, when executed by at least one processor, program the at least one processor to perform operations for determining duplicate data for de-duplicating data in a computer system, the operations comprising:
-
reading a first predefined set of multiple summaries associated with a first region of data in a storage of the computer system, each member of the first predefined set of multiple summaries being a micro-fingerprint value characterizing a portion of data within the first region of data; selecting a first member from the first predefined set of multiple summaries based on a value of the micro-fingerprint value of the first member; generating, at least in part, a first macro-fingerprint associated with the first region of data by storing the first member within the first macro-fingerprint; reading a second predefined set of multiple summaries associated with a set of data, each member of the second predefined set of multiple summaries being a micro-fingerprint value characterizing a portion of data within the set of data; selecting a particular member from the second predefined set of multiple summaries based on a value of the micro-fingerprint value of the particular member; generating, at least in part, a second macro-fingerprint associated with the set of data by storing the second member within the second macro-fingerprint; and comparing the first macro-fingerprint associated with the first region with the second macro-fingerprint associated with the set of data to determine, at least in part, the duplicate data. - View Dependent Claims (20)
-
Specification