Stratified sampling of data in a database system
First Claim
Patent Images
1. A method of performing stratified sampling in a database system, comprising:
- receiving a query containing a clause indicating stratified sampling of a source table is to be performed, the clause containing plural stratification conditions; and
generating one or more commands to send to a processing module, the one or more commands containing instructions to evaluate the stratification conditions and to perform sampling of data from the source table.
2 Assignments
0 Petitions
Accused Products
Abstract
A stratified sampling mechanism is provided in a database system. The stratified sampling mechanism includes defining a clause in a query that indicates stratified sampling is desired. Data from a source table is stratified into different subgroups based on stratification conditions in the query. Sampling is performed within each subgroup.
54 Citations
21 Claims
-
1. A method of performing stratified sampling in a database system, comprising:
-
receiving a query containing a clause indicating stratified sampling of a source table is to be performed, the clause containing plural stratification conditions; and
generating one or more commands to send to a processing module, the one or more commands containing instructions to evaluate the stratification conditions and to perform sampling of data from the source table. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An article comprising at least one storage medium containing instructions that when executed cause a database system to:
-
generate one or more commands to perform stratified sampling; and
send the one or more commands to plural access modules of the database system to cause each of the plural access modules to perform the stratified sampling in parallel. - View Dependent Claims (11, 12, 13, 14, 16, 17, 18, 19, 20)
-
-
15. A database system comprising:
-
a storage to store a base table; and
a controller adapted to receive a request containing plural stratification conditions to divide data in the base table into corresponding plural strata, the controller adapted to perform random sampling, in response to the request, of data in each stratum.
-
-
21. A database system comprising:
-
a plurality of storage modules;
a plurality of access modules to manage respective storage modules; and
a parsing engine to receive a stratified sampling query specifying plural stratification conditions, the parsing engine to generate one or more commands to indicate performance of the stratified sampling, the parsing engine to send the one or more commands to the access modules, in response to the one or more commands, each access module to generate plural input spool files corresponding to plural strata, the input spool files to store qualifying rows from a source table, the access module to selectively write a given row into one of the input spool files based on which stratification condition the given row satisfies, each access module to further perform random sampling of the rows in each input spool file.
-
Specification