System and method for navigating data
First Claim
1. A computer implemented method of identifying reference data likely to match target data, the method comprising:
- reading a reference set of summaries of data included in a reference 5 data set, each member of the reference set of summaries including a plurality of summaries that indicate particular patterns of the reference data within the reference data set;
comparing the reference set of summaries to a target set of summaries associated with at least one target area of a plurality of target areas, each member of the target set of summaries including a plurality of summaries that indicate particular patterns of the target data included in the at least one target area, the plurality of target areas being included in a target data set; and
associating the at least one target area with the reference data set when a threshold number of members of the target set of summaries associated with the at least one target area match members of the reference set of summaries.
6 Assignments
0 Petitions
Accused Products
Abstract
Described are computer-based methods and apparatuses, including computer program products, for removing redundant data from a storage system. In one example, a data delineation process delineates data targeted for de-duplication into regions using a plurality of markers. The de-duplication system determines which of these regions should be subject to further de-duplication processing by comparing metadata representing the regions to metadata representing regions of a reference data set. The de-duplication system identifies an area of data that incorporates the regions that should be subject to further de-duplication processing and de-duplicates this area with reference to a corresponding area within the reference data set.
-
Citations
20 Claims
-
1. A computer implemented method of identifying reference data likely to match target data, the method comprising:
-
reading a reference set of summaries of data included in a reference 5 data set, each member of the reference set of summaries including a plurality of summaries that indicate particular patterns of the reference data within the reference data set; comparing the reference set of summaries to a target set of summaries associated with at least one target area of a plurality of target areas, each member of the target set of summaries including a plurality of summaries that indicate particular patterns of the target data included in the at least one target area, the plurality of target areas being included in a target data set; and associating the at least one target area with the reference data set when a threshold number of members of the target set of summaries associated with the at least one target area match members of the reference set of summaries. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system configured to identify reference data likely to match target data, the system comprising:
-
data storage storing a target data set; and a processor coupled to the data storage and configured to; read a reference set of summaries of data included in a reference data set, each member of the reference set of summaries including a plurality of summaries that indicate particular patterns of the reference data within the reference data set; compare the reference set of summaries to a target set of summaries associated with at least one target area of a plurality of target areas, each member of the target set of summaries including a plurality of summaries that indicate particular patterns of the target data included in the at least one target area, the plurality of target areas being included in the target data set; and associate the at least one target area with the reference data set when a threshold number of members of the target set of summaries associated with the at least one target area match members of the reference set of summaries. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer readable medium storing computer readable instructions that, when executed by at least one processor, instruct the at least one processor to perform a method of identifying reference data likely to match target data, the method comprising:
-
reading a reference set of summaries of data included in a reference data set, each member of the reference set of summaries including a plurality of summaries that indicate particular patterns of the reference data within the reference data set; comparing the reference set of summaries to a target set of summaries associated with at least one target area of a plurality of target areas, each member of the target set of summaries including a plurality of summaries that indicate particular patterns of the target data included in the at least one target area, the plurality of target areas being included in a target data set; and associating the at least one target area with the reference data set when a threshold number of members of the target set of summaries associated with the at least one target area match members of the reference set of summaries. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification