EFFICIENT WEIGHTED CONSISTENT SAMPLING
First Claim
1. A machine-implemented method for performing weighted consistent sampling, the machine-implemented method comprising:
- transforming each respective consistent uniformly distributed non-negative random number, representing a corresponding element of a plurality of sets, to a transformed value by determining a wth root of a value based on the respective consistent uniformly distributed random number, such that w is based on a weight assigned to the corresponding element;
determining a respective minimum transformed value for each of the plurality of sets or a respective maximum transformed value for each of the plurality of sets; and
determining which of the plurality of sets are at least similar based on the respective determined minimum transformed value for each of the plurality of sets or the respective determined maximum transformed value for each of the plurality of sets.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and a processing device may be provided for performing efficient weighted consistent sampling. A group of sets having multiple elements with associated weights may be provided. A single hash function may be applied to each of the elements of the group of sets to produce consistent uniformly distributed non-negative random numbers. Transformed values corresponding to each of the elements may be produced by determining a wth root of a value based on applying the hash function to a respective element, where w may be based on a weight associated with the respective element. A minimum transformed value or a maximum transformed value may be determined for each of the sets. Sets having matching ones of the minimum transformed value or the maximum transformed value may be determined. The determined sets may be considered to be similar.
-
Citations
20 Claims
-
1. A machine-implemented method for performing weighted consistent sampling, the machine-implemented method comprising:
-
transforming each respective consistent uniformly distributed non-negative random number, representing a corresponding element of a plurality of sets, to a transformed value by determining a wth root of a value based on the respective consistent uniformly distributed random number, such that w is based on a weight assigned to the corresponding element; determining a respective minimum transformed value for each of the plurality of sets or a respective maximum transformed value for each of the plurality of sets; and determining which of the plurality of sets are at least similar based on the respective determined minimum transformed value for each of the plurality of sets or the respective determined maximum transformed value for each of the plurality of sets. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A processing device comprising:
-
at least one processor; and a memory connected to the at least one processor, the memory further comprising; instructions for transforming, in only a single pass through original elements of a plurality of sets, respective consistent uniformly distributed non-negative random numbers, representing the original elements of the plurality of sets, to transformed values according to a formula
where h(x) is a consistent uniformly distributed non-negative random number representing an original element x of a set, a and b are constants, and w(x) is based on a weight assigned to the original element x, andinstructions for determining a similarity among a plurality of the sets based on a respective minimum transformed value of each of the plurality of sets or based on a respective maximum transformed value of each of the plurality of sets. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A machine-implemented method for performing weighted consistent sampling, the machine-implemented method comprising:
-
providing a consistent uniformly distributed non-negative random number for each of a plurality of original elements of a plurality of sets; providing a respective weight to each of the plurality of original elements, the respective weight being indicative of a level of relevance; applying a formula, where h(x) is one of the provided uniformly distributed non-negative random number representing an element x of a set, a and b are constants, w(x) is based on a weight associated with the element x, and hx is a transformed value; and determining similar ones of the plurality of sets based on at least one of the transformed values of each of the plurality of sets. - View Dependent Claims (15, 16, 17, 18, 19, 20)
to each of the original elements of the plurality of sets only once, where h(x) is the one hash function.
-
Specification