System and method for identifying locations within data
First Claim
1. A computer implemented method of marking data for processing, the method comprising:
- determining a rolling summary that identifies a particular pattern of stored data included in each respective region of a plurality of overlapping regions;
comparing at least one proper subset of the rolling summary to a predetermined value;
recording a location identifier that identifies a location within the data where the at east one proper subset equals the predetermined value,determining a metric that indicates a frequency with which location identifiers are recorded for the data;
comparing the metric to a predetermined threshold; and
adjusting, responsive to the metric transgressing the predetermined threshold, a characteristic of the at least one proper subset.
6 Assignments
0 Petitions
Accused Products
Abstract
Described are computer-based methods and apparatuses, including computer program products, for removing redundant data from a storage system. In one example, a data delineation process delineates data targeted for de-duplication into regions using a plurality of markers. The de-duplication system determines which of these regions should be subject to further de-duplication processing by comparing metadata representing the regions to metadata representing regions of a reference data set. The de-duplication system identifies an area of data that incorporates the regions that should be subject to further de-duplication processing and de-duplicates this area with reference to a corresponding area within the reference data set.
77 Citations
20 Claims
-
1. A computer implemented method of marking data for processing, the method comprising:
-
determining a rolling summary that identifies a particular pattern of stored data included in each respective region of a plurality of overlapping regions; comparing at least one proper subset of the rolling summary to a predetermined value; recording a location identifier that identifies a location within the data where the at east one proper subset equals the predetermined value, determining a metric that indicates a frequency with which location identifiers are recorded for the data; comparing the metric to a predetermined threshold; and adjusting, responsive to the metric transgressing the predetermined threshold, a characteristic of the at least one proper subset. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system configured to mark data for processing, the system comprising:
-
data storage storing the data, the data including a plurality of overlapping regions; and a processor coupled to the data storage and configured to; determine a rolling summary of each respective region of the plurality of overlapping regions based on stored data included in each respective region, the rolling summary identifying a particular pattern of the stored data; compare at least one proper subset of the rolling summary o a predetermined value; record a location identifier that identifies a location within the data where the at least one proper subset equals the predetermined value; determine a metric that indicates a frequency with which location identifiers are recorded for the data; compare the metric to a predetermined threshold and adjust, responsive to the metric transgressing the predetermined threshold, a characteristic of the at least one proper subset. - View Dependent Claims (14, 15, 16)
-
- 10. The system according to claim, wherein the processor is configured to determine the rolling summary by calculating a hash value from the stored data.
-
17. A non-transitory computer readable medium storing computer readable instructions that, when executed by at least one processor, instruct the at least one processor to perform a method of marking data for processing, the method comprising:
-
determining a rolling summary that identifies a particular pattern of stored data included in each respective region of a plurality of overlapping regions; comparing at least one proper subset of the rolling summary to a predetermined value; recording a location identifier that identifies a location within the data where the at least one proper subset equals the predetermined value; determining a metric that indicates a frequency with which location identifiers are recorded for the data; comparing the metric to a predetermined threshold; and adjusting, responsive to the metric transgressing the predetermined threshold, a characteristic of the at least one proper subset. - View Dependent Claims (18, 19, 20)
-
Specification