×

Method for clustering closely resembling data objects

  • US 6,119,124 A
  • Filed: 03/26/1998
  • Issued: 09/12/2000
  • Est. Priority Date: 03/26/1998
  • Status: Expired due to Term
First Claim
Patent Images

1. A computer-implemented method of determining the resemblance of a plurality of data objects, comprising the steps of:

  • parsing each data object into a canonical sequence of tokens;

    grouping overlapping sequences of the tokens of each data object into shingles;

    assigning a unique identification element to each shingle;

    permuting the elements of the data objects to form image sets;

    selecting a predetermined number of minimum elements from each image to form a sketch;

    partitioning the selected elements of each sketch into a plurality of groups; and

    assigning another unique identification to each group to generate the features of each data object to determine a level of resemblance of the plurality of data objects.

View all claims
  • 10 Assignments
Timeline View
Assignment View
    ×
    ×