×

Generating sketches sensitive to high-overlap estimation

  • US 8,572,092 B2
  • Filed: 12/16/2011
  • Issued: 10/29/2013
  • Est. Priority Date: 12/16/2011
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method comprising:

  • dividing, by a computer, a first collection of data objects into m groups of average size s, wherein a data object of the first collection is assigned to one or more of the m groups;

    computing a combined hash result for all members of a respective group, for each hash function in n hash functions;

    constructing a first sketch vector with n elements, wherein a respective element is selected, using a selection function, from the combined hash results computed with the hash function corresponding to the element'"'"'s index;

    receiving a second sketch vector for a second collection of data objects;

    determining a sketch-vector overlap between the first and second sketch vectors; and

    computing a data-object overlap between the first and second collections of data objects based on the sketch-vector overlap, wherein computing the data-object overlap comprises entering the sketch-vector overlap into a conversion function;


    data-object overlap=(sketch-vector overlap)1/s;

    wherein s indicates an average number of data objects per group.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×