×

Data duplication detection system and method for controlling data duplication detection system

  • US 9,239,844 B2
  • Filed: 03/29/2013
  • Issued: 01/19/2016
  • Est. Priority Date: 03/29/2013
  • Status: Expired due to Fees
First Claim
Patent Images

1. A data duplication detection system for detecting a duplication of data, the data duplication detection system comprising:

  • a data duplication determination part configured to determine whether each of a plurality of pieces of chunk data formed by dividing a received data is a duplicate of chunk data that has already been stored;

    a storage part configured to store chunk data that has been determined not to be duplicative by the data duplication determination part;

    a first management table configured to manage, for each piece of chunk data stored in the storage part, identity guarantee data that indicates a data identity, with the identity guarantee data being associated with storage-destination information that indicates a data storage destination;

    a second management table created on the basis of the identity guarantee data for the each piece of chunk data stored in the storage part, the second management table being configured to indicate with prescribed reliability that a piece of chunk data is stored in the storage part, the prescribed reliability being a probability equal to greater than a probability threshold value, the threshold value calculated based on a number of prescribed hash functions used to determine hash values for the piece of chunk data and a number of bits of a bit string that indicates the hash values at positions of the bit string corresponding to the hash values; and

    a third management table configured to manage a plurality of chunk data sets formed by grouping together the pieces of chunk data stored in the storage part, the third management table being configured to manage the identity guarantee data for prescribed chunk data that represents each of the plurality of chunk data sets,wherein the data duplication determination part;

    in a case where the second management table indicates that a target chunk data included in the received data is stored in the storage part, and, in addition, in a case where determination is made that the identity guarantee data for the target chunk data is not stored in the third management table, temporarily stores the target chunk data in a temporary storage part;

    in a case where the second management table indicates that a second target chunk data that differs from the target chunk data is stored in the storage part, and, in addition, in a case where determination is made that the identity guarantee data for the second target chunk data is stored in the third management table, determines whether the identity guarantee data for the target chunk data stored in the temporary storage part is stored in the first management table;

    in a case where determination is made that the identity guarantee data for the target chunk data stored in the temporary storage part is stored in the first management table, determines that the target chunk data stored in the temporary storage part is already stored in the storage part; and

    in a case where determination is made that the identity guarantee data for the target chunk data stored in the temporary storage part is not stored in the first management table, determines that the target chunk data stored in the temporary storage part is not stored in the storage part.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×