×

Data hashing method, data processing method, and data processing system using similarity-based hashing algorithm

  • US 7,617,231 B2
  • Filed: 12/06/2006
  • Issued: 11/10/2009
  • Est. Priority Date: 12/07/2005
  • Status: Expired due to Fees
First Claim
Patent Images

1. A data hashing method using a similarity-based hashing (SBH) algorithm, the data hashing method comprising:

  • receiving computerized data; and

    generating a hash value of the computerized data using the SBH algorithm in which two data are the same if calculated hash values are the same and two data are similar if the difference of calculated hash values is small,wherein the hash value has at least two variable values that allows for a quick search of the computerized data for determining if the two data are similar, wherein the generating of the hash value of the computerized data using the SBH algorithm comprises;

    calculating a fingerprint value from the content of the computerized data;

    changing a component value of an Nth-order hash vector to correspond to the fingerprint value according to a predetermined rule;

    determining whether the entire amount of the content of the computerized data has been processed; and

    if it is determined that the entire amount of the content of the computerized data has been processed, converting the Nth-order hash vector to the hash value, andwherein the calculating of the fingerprint value comprises;

    extracting a shingle, which is a continuous or discontinuous byte-string having a predetermined length, from the computerized data; and

    generating a fingerprint value using a data hashing algorithm which satisfies uniformity and randomness criteria for the shingle and has a low possibility of collision.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×