System and method for summarizing data
First Claim
1. A computer implemented method of characterizing data associated with a plurality of location identifiers, each location identifier of the plurality of location identifiers identifying a location within the data where a particular pattern of data is stored, the method comprising:
- identifying a first portion of the data based on a location of the first portion relative to a location identified by at least one first location identifier of the plurality of location identifiers;
identifying a second portion of the data based on a location of the second portion relative to a location identified by at least one second location identifier of the plurality of location identifiers;
determining one or more first summaries associated with the at least one first location identifier, at least one summary of the one or more first summaries indicating a pattern of stored data included in the first portion;
determining one or more second summaries associated with the at least one second location identifier, at least one summary of the one or more second summaries indicating another pattern of stored data included in the second portion;
identifying at least one fingerprint summary for the data based on values of summaries included in a set of summaries comprising the one or more first summaries and the one or more second summaries; and
storing an association between the at least one fingerprint summary and the data.
6 Assignments
0 Petitions
Accused Products
Abstract
Described are computer-based methods and apparatuses, including computer program products, for removing redundant data from a storage system. In one example, a data delineation process delineates data targeted for de-duplication into regions using a plurality of markers. The de-duplication system determines which of these regions should be subject to further de-duplication processing by comparing metadata representing the regions to metadata representing regions of a reference data set. The de-duplication system identifies an area of data that incorporates the regions that should be subject to further de-duplication processing and de-duplicates this area with reference to a corresponding area within the reference data set.
-
Citations
20 Claims
-
1. A computer implemented method of characterizing data associated with a plurality of location identifiers, each location identifier of the plurality of location identifiers identifying a location within the data where a particular pattern of data is stored, the method comprising:
-
identifying a first portion of the data based on a location of the first portion relative to a location identified by at least one first location identifier of the plurality of location identifiers; identifying a second portion of the data based on a location of the second portion relative to a location identified by at least one second location identifier of the plurality of location identifiers; determining one or more first summaries associated with the at least one first location identifier, at least one summary of the one or more first summaries indicating a pattern of stored data included in the first portion; determining one or more second summaries associated with the at least one second location identifier, at least one summary of the one or more second summaries indicating another pattern of stored data included in the second portion; identifying at least one fingerprint summary for the data based on values of summaries included in a set of summaries comprising the one or more first summaries and the one or more second summaries; and storing an association between the at least one fingerprint summary and the data. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system configured to characterize data, the system comprising:
-
data storage storing the data and a plurality of location identifiers, each of the plurality of location identifiers identifying a location within the data where a particular pattern of data is stored; and at least one processor coupled to the data storage and configured to; identify a first portion of the data based on a location of the first portion relative to a location identified by at least one first location identifier of the plurality of location identifiers; identify a second portion of the data based on a location of the second portion relative to a location identified by at least one second location identifier of the plurality of location identifiers; determine one or more first summaries associated with the at least one first location identifier, at least one summary of the one or more first summaries indicating a pattern of stored data included in the first portion; determine one or more second summaries associated with the at least one second location identifier, at least one summary of the one or more second summaries indicating another pattern of stored data included in the second portion; identify at least one fingerprint summary for the data based on values of summaries included in a set of summaries comprising the one or more first summaries and the one or more second summaries; and store an association between the at least one fingerprint summary and the data. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer readable medium storing computer readable instructions that, when executed by at least one processor, instruct the at least one processor to perform a method of characterizing data associated with a plurality of location identifiers, each location identifier of the plurality of location identifiers identifying a location within the data where a particular pattern of data is stored, the method comprising:
-
identifying a first portion of the data based on a location of the first portion relative to a location identified by at least one first location identifier of the plurality of location identifiers; identifying a second portion of the data based on a location of the second portion relative to a location identified by at least one second location identifier of the plurality of location identifiers; determining one or more first summaries associated with the at least one first location identifier, at least one summary of the one or more first summaries indicating a pattern of stored data included in the first portion; determining one or more second summaries associated with the at least one second location identifier, at least one summary of the one or more second summaries indicating another pattern of stored data included in the second portion; identifying at least one fingerprint summary for the data based on values of summaries included in a set of summaries comprising the one or more first summaries and the one or more second summaries; and storing an association between the at least one fingerprint summary and the data. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification