×

Method for clustering closely resembling data objects

  • US 6,349,296 B1
  • Filed: 08/21/2000
  • Issued: 02/19/2002
  • Est. Priority Date: 03/26/1998
  • Status: Expired due to Term
First Claim
Patent Images

1. A method for determining the resemblance of a plurality of data objects, the method comprising the steps of:

  • parsing each data object into a canonical sequence of tokens;

    grouping contiguous sequences in the canonical sequence of tokens of each data object into shingles;

    assigning a unique identification element to each shingle;

    subjecting the unique identification elements to a plurality of permutations to provide a corresponding plurality of images;

    selecting a smallest unique identification element from each of the plurality of images to provide a sketch of each corresponding data object; and

    comparing data object sketches so as to determine common smallest unique identification elements between data object sketches and thereby determine data object resemblance.

View all claims
  • 7 Assignments
Timeline View
Assignment View
    ×
    ×