×

Out-of core similarity matching

  • US 9,727,573 B1
  • Filed: 09/10/2014
  • Issued: 08/08/2017
  • Est. Priority Date: 12/22/2011
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for data reduction, the method comprising:

  • in response to a request for compressing data in a data storage system, partitioning the data into a plurality of data chunks, including a target data chunk and a base data chunk;

    generating representative data for the target data chunk and the base data chunk by applying a predetermined algorithm to the target data chunk and the base data chunk, the representative data including fingerprints of the target data chunk and the base data chunk and a plurality of features extracted from the target data chunk and the base data chunk, wherein each of the plurality of features is a value having a property that a probability of the target data chunk having same representative value as the base data chunk is proportional to data similarity of the target data chunk and the base data chunk;

    sorting the representative data for the target data chunk and the base data chunk to form a sorted representative data list based on a first feature defined in the representative data for the target data chunk and the base data chunk; and

    generating a delta data chunk as the difference between the target data chunk and the base data chunk where the representative data of the target chunk is proximate to the representative data of the base data chunk in the sorted representative data list.

View all claims
  • 6 Assignments
Timeline View
Assignment View
    ×
    ×