Efficient sampling of a relational database
First Claim
1. A method for sampling data from a relational database, comprising:
- choosing rows from a relational database for sampling, wherein rows are arranged into pages and include column values, pages are arranged into tables and tables comprise rows and columns;
wherein pages are chosen for sampling according to a probability P and rows on each selected page are chosen for sampling according to a probability R, so that an overall probability of choosing a row for sampling is Q=PR; and
wherein P and R are based on desired processing speed and desired precision.
1 Assignment
0 Petitions
Accused Products
Abstract
A system, method and computer readable medium for sampling data from a relational database are disclosed, where an information processing system chooses rows from a table in a relational database for sampling, wherein data values are arranged into rows, rows are arranged into pages, and pages are arranged into tables. Pages are chosen for sampling according to a probability P and rows in a selected page are chosen for sampling according to a probability R, so that the overall probability of choosing a row for sampling is Q=PR. The probabilities P and R are based on the desired precision of estimates computed from a sample, as well as processing speed. The probabilities P and R are further based on either catalog statistics of the relational database or a pilot sample of rows from the relational database.
51 Citations
21 Claims
-
1. A method for sampling data from a relational database, comprising:
-
choosing rows from a relational database for sampling, wherein rows are arranged into pages and include column values, pages are arranged into tables and tables comprise rows and columns;
wherein pages are chosen for sampling according to a probability P and rows on each selected page are chosen for sampling according to a probability R, so that an overall probability of choosing a row for sampling is Q=PR; and
wherein P and R are based on desired processing speed and desired precision. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer readable medium including computer instructions for sampling data from a relational database, the computer instructions including instructions for:
-
choosing rows from a relational database for sampling, wherein rows are arranged into pages and include column values, pages are arranged into tables and tables comprise rows and columns;
wherein pages are chosen for sampling according to a probability P and rows on each selected page are chosen for sampling according to a probability R, so that an overall probability of choosing a row for sampling is Q=PR; and
wherein P and R are based on desired processing speed and desired precision. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer system for sampling data from a relational database, comprising:
-
a relational database including tables comprising rows and columns, wherein rows are arranged into pages and include column values, and pages are arranged into tables; and
a processor for choosing rows from a relational database for sampling, wherein rows are arranged into pages and include data values, pages are arranged into tables and tables comprise rows and columns;
wherein pages are chosen for sampling according to a probability P and rows on each selected page are chosen for sampling according to a probability R, so that an overall probability of choosing a row for sampling is Q=PR; and
wherein P and R are based on desired processing speed and desired precision. - View Dependent Claims (20, 21)
-
Specification