×

Ranking system

  • US 8,478,762 B2
  • Filed: 05/01/2009
  • Issued: 07/02/2013
  • Est. Priority Date: 05/01/2009
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method of identifying a top k most frequently occurring items in a data set, the method comprising:

  • arranging a processor to access a value of a confidence measure and a value of an error tolerance parameter;

    arranging the processor to sample items from the data set at least as many times as prescribed by a function of the confidence measure, the error tolerance parameter and other parameters which exclude a parameter m being a number of distinct items in the data set, and to continue sampling items until a stopping rule is met, the stopping rule being based in part on a posterior probability of error conditional on an observed empirical frequency of items in the data set;

    arranging a memory to form a sample sketch from the sampled items; and

    arranging the processor to identify the top k items from the sample sketch;

    wherein the stopping rule is selected from one of;

    stopping sampling when a sample counter exceeds the maximum value of the sum of an empirical frequency of a k-th largest frequency item in a sample set and a largest frequency that is smaller than a threshold or relative threshold for frequencies of false items, divided by the empirical frequency of the k-th largest frequency item in the sample set minus the largest frequency that is smaller than the threshold or relative threshold for frequencies of false items, multiplied by a Δ

    quantile of a normal distribution; and

    stopping sampling when a sample counter exceeds a sum of an empirical frequency of a k-th largest frequency item in a sample set and a largest frequency that is smaller than a threshold or relative threshold for frequencies of false items, divided by the empirical frequency of the k-th largest frequency item in the sample set minus the largest frequency that is smaller than the threshold or relative threshold for frequencies of false items squared, and multiplied by a Δ

    quantile of a normal distribution.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×