Partition boundary determination using random sampling on very large databases
First Claim
1. A method for database partition boundary determination, the method comprising the steps of:
- providing a pre-configured number S defining a default sample size;
selectively receiving a particular number defining a desired sample size and setting said number S equal to said particular number;
providing a seed value for initializing a random number algorithm;
randomly sampling S records of the database using the random sampling algorithm, wherein said S records are different each time said method is utilized with different seed values, and wherein said S records are different for successive utilizations of said method if at least one record has been added to or deleted from said database between successive utilizations of said method;
storing statistics for each of said S records as stored statistics including a record key for each record; and
, producing an approximation partition analysis based on said stored statistics, wherein said approximation partition analysis is not mathematically exact.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method utilizing random sampling for partition analysis on very large databases. The method utilizes a random sampling algorithm that provides results accurate to within a few percentage points for large homogeneous databases. The accuracy is not affected by the size of the database and is determined primarily by the size of the sample. The system and method for approximate partition analysis reduces the time required for an analysis to a fraction of the time required for an exact analysis. The reduction in time thereby permits more frequent and timely analyses of database partition sizes.
19 Citations
23 Claims
-
1. A method for database partition boundary determination, the method comprising the steps of:
-
providing a pre-configured number S defining a default sample size;
selectively receiving a particular number defining a desired sample size and setting said number S equal to said particular number;
providing a seed value for initializing a random number algorithm;
randomly sampling S records of the database using the random sampling algorithm, wherein said S records are different each time said method is utilized with different seed values, and wherein said S records are different for successive utilizations of said method if at least one record has been added to or deleted from said database between successive utilizations of said method;
storing statistics for each of said S records as stored statistics including a record key for each record; and
,producing an approximation partition analysis based on said stored statistics, wherein said approximation partition analysis is not mathematically exact. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
13. A database partition boundary determination system comprising:
-
a first computer program routine having a random number generating algorithm;
a second computer program routine having a random sampling facility utilizing said first program routine to randomly read records from a database and store statistics for each record read including a record key, wherein said records read are different each time said second routine is utilized with different seed values, and wherein said records read are different for successive utilizations of said second routine if at least one record has been added to or deleted from said database between successive utilizations of said second routine; and
,a third computer program routine for generating a partition boundary analysis based on said stored statistics, wherein said partition boundary analysis is an approximation and is not mathematically exact.
-
Specification