METHOD FOR MAINTAINING A SAMPLE SYNOPSIS UNDER ARBITRARY INSERTIONS AND DELETIONS
First Claim
1. A method of incrementally maintaining a stable, bounded, uniform random sample S from a dataset R, in the presence of arbitrary insertions and deletions of an item t to the dataset R, and without accesses to the dataset R, the sample S having a maximum size of M items, the method comprising the steps of:
- initializing to zero a counter c1 for uncompensated deletions that have been applied to sample S;
initializing to zero a counter c2 for uncompensated deletions that have not been applied to sample S;
while transactions comprising insertions and deletions are applied to dataset R, each transaction having a transaction type selected from a group consisting of an insertion transaction and a deletion transaction, performing steps A through C;
A. receiving a transaction applied to dataset R for an item t;
B. if the received transaction is an insertion transaction, thenif the total number c1+c2 of uncompensated deletions is zero, then performing the steps ofincluding the item t into the sample S with probability equal to the value of M divided by the maximum of size |R| and M, wherein |R| is the size of the dataset R just after insertion of item t,adding item t to sample S if |S|<
M, andreplacing a randomly selected item in sample S by item t if |S|<
=Motherwise if the total number c1+c2 of uncompensated deletions is non-zero, then performing the steps ofinserting item t into the sample S with probability equal to the value of c1/(c1+c2),decrementing by 1 the value of counter c1 if the item t is inserted into sample S; and
decrementing by 1 the value of counter c2 if the item t is not inserted into sample S; and
C. if the received transaction is a deletion transaction, thenif the deleted item t is in the sample S, then performing the steps ofremoving the deleted item t from S andincreasing the value of the counter c1 by one, otherwise, performing the step ofincreasing the value of the counter c2 by one.
3 Assignments
0 Petitions
Accused Products
Abstract
A method of incrementally maintaining a stable, bounded, uniform random sample S from a dataset R, in the presence of arbitrary insertions and deletions to the dataset R, and without accesses to the dataset R, comprises a random pairing method in which deletions are uncompensated until compensated by a subsequent insertion (randomly paired to the deletion) by including the insertion'"'"'s item into S if and only if the uncompensated deletion'"'"'s item was removed from S (i.e., was in S so that it could be removed). A method for resizing a sample to a new uniform sample of increased size while maintaining a bound on the sample size and balancing cost between dataset accesses and transactions to the dataset is also disclosed. A method for maintaining uniform, bounded samples for a dataset in the presence of growth in size of the dataset is additionally disclosed.
23 Citations
2 Claims
-
1. A method of incrementally maintaining a stable, bounded, uniform random sample S from a dataset R, in the presence of arbitrary insertions and deletions of an item t to the dataset R, and without accesses to the dataset R, the sample S having a maximum size of M items, the method comprising the steps of:
-
initializing to zero a counter c1 for uncompensated deletions that have been applied to sample S; initializing to zero a counter c2 for uncompensated deletions that have not been applied to sample S; while transactions comprising insertions and deletions are applied to dataset R, each transaction having a transaction type selected from a group consisting of an insertion transaction and a deletion transaction, performing steps A through C; A. receiving a transaction applied to dataset R for an item t; B. if the received transaction is an insertion transaction, then if the total number c1+c2 of uncompensated deletions is zero, then performing the steps of including the item t into the sample S with probability equal to the value of M divided by the maximum of size |R| and M, wherein |R| is the size of the dataset R just after insertion of item t, adding item t to sample S if |S|<
M, andreplacing a randomly selected item in sample S by item t if |S|<
=Motherwise if the total number c1+c2 of uncompensated deletions is non-zero, then performing the steps of inserting item t into the sample S with probability equal to the value of c1/(c1+c2), decrementing by 1 the value of counter c1 if the item t is inserted into sample S; and decrementing by 1 the value of counter c2 if the item t is not inserted into sample S; and C. if the received transaction is a deletion transaction, then if the deleted item t is in the sample S, then performing the steps of removing the deleted item t from S and increasing the value of the counter c1 by one, otherwise, performing the step of increasing the value of the counter c2 by one.
-
-
2-19. -19. (canceled)
Specification