Sampling for database systems
First Claim
1. A sample operator for obtaining a sample of a plurality of records in a database system, the sample operator having the plurality of records and sampling semantics as parameters.
2 Assignments
0 Petitions
Accused Products
Abstract
A database server supports weighted and unweighted sampling of records or tuples in accordance with desired sampling semantics such as with replacement (WR), without replacement (WoR), or independent coin flips (CF) semantics, for example. The database server may perform such sampling sequentially not only to sample non-materialized records, such as those produced as a stream by a pipeline in a query tree for example, but also to sample records, whether materialized or not, in a single pass. The database server also supports sampling over a join of two relations of records or tuples without requiring the computation of the full join and without requiring the materialization of both relations and/or indexes on the join attribute values of both relations.
62 Citations
88 Claims
- 1. A sample operator for obtaining a sample of a plurality of records in a database system, the sample operator having the plurality of records and sampling semantics as parameters.
- 7. A sample operator for obtaining a sample of a plurality of records in a database system, the sample operator having the plurality of records as a parameter and having a weight function as a parameter to specify a sampling weight for each record.
-
11. A method for obtaining a sample from a plurality of records in a database system, the method comprising the steps of:
-
(a) identifying the plurality of records and sampling semantics from parameters of a sample operator; and
(b) obtaining a sample from the identified plurality of records using the identified sampling semantics. - View Dependent Claims (12, 13, 14, 15, 16, 17, 19, 20, 21, 22, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 60, 61, 62, 63, 64, 65, 66, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80)
-
-
18. A method for obtaining a sample from a plurality of records in a database system, the method comprising the steps of:
-
(a) identifying the plurality of records and a weight function from parameters of a sample operator, wherein the weight function specifies a weight for each record; and
(b) obtaining a sample from the identified plurality of records based on the specified weight of each record.
-
-
23. A method for performing a sequential sampling of records in one pass in a database system, the method comprising the steps of:
-
(a) obtaining one record from a plurality of records;
(b) selectively outputting the one record obtained in step (a) one or more times based on a probability; and
(c) repeating steps (a) and (b) for one or more other records of the plurality of records to form a sample of the plurality of records, wherein at least one record obtained in step (a) may be output more than one time for step (b).
-
-
37. A method for performing a sequential sampling of records in one pass in a database system, the method comprising the steps of:
-
(a) obtaining one record from a plurality of records;
(b) selectively resetting one or more records of a reservoir to be the one record obtained in step (a) based on a probability; and
(c) repeating steps (a) and (b) for other records of the plurality of records such that the records of the reservoir form a sample of the plurality of records, wherein at least one record obtained in step (a) may be used to reset more than one record of the reservoir for step (b).
-
-
45. A computer readable medium having computer-executable instructions for performing a sequential sampling of records in one pass, the computer-executable instructions for performing the steps of:
-
(a) obtaining one record from a plurality of records;
(b) selectively outputting the one record obtained in step (a) one or more times based on a probability; and
(c) repeating steps (a) and (b) for one or more other records of the plurality of records to form a sample of the plurality of records, wherein at least one record obtained in step (a) may be output more than one time for step (b).
-
-
59. A computer readable medium having computer-executable instructions for performing a sequential sampling of records in one pass, the computer-executable instructions for performing the steps of:
-
(a) obtaining one record from a plurality of records;
(b) selectively resetting one or more records of a reservoir to be the one record obtained in step (a) based on a probability; and
(c) repeating steps (a) and (b) for other records of the plurality of records such that the records of the reservoir form a sample of the plurality of records, wherein at least one record obtained in step (a) may be used to reset more than one record of the reservoir for step (b).
-
-
67. A database system for performing a sequential sampling of records in one pass in the database system, the database system comprising:
-
means for performing the step of (a) obtaining one record from a plurality of records;
means for performing the step of (b) selectively outputting the one record one or more times based on a probability; and
means for repeating steps (a) and (b) for one or more other records of the plurality of records to form a sample of the plurality of records, wherein at least one record obtained in step (a) may be output more than one time for step (b).
-
-
81. A database system for performing a sequential sampling of records in one pass in the database system, the database system comprising:
-
means for performing the step of (a) obtaining one record from a plurality of records;
means for performing the step of (b) selectively resetting one or more records of a reservoir to be the one record obtained in step (a) based on a probability; and
means for performing the step of (c) repeating steps (a) and (b) for other records of the plurality of records such that the records of the reservoir form a sample of the plurality of records, wherein at least one record obtained in step (a) may be used to reset more than one record of the reservoir for step (b). - View Dependent Claims (82, 83, 84, 85, 86, 87, 88)
-
Specification