MANAGING UNCERTAIN DATA USING MONTE CARLO TECHNIQUES
First Claim
1. A method comprising:
- specifying data uncertainty using at least one variable generation (VG) function, wherein said VG function generates pseudorandom samples of uncertain data values;
specifying a random database based on said VG function;
generating a number N Monte Carlo instantiations of said random database, wherein N is a number greater than 1;
identifying a database tuple bundle t, wherein the database tuple bundle t is a data structure representing N instantiations of a tuple in the N Monte Carlo instantiations;
executing a query Q over the N Monte Carlo instantiations, wherein said executing comprises;
executing a query plan for the query Q once over the set of all database tuple bundles; and
outputting zero or more numerical values that are used to estimate statistical properties of the probability distribution of the result of the query Q.
3 Assignments
0 Petitions
Accused Products
Abstract
According to one embodiment of the present invention, a method for managing uncertain data is provided. The method includes specifying data uncertainty using at least one variable generation (VG) function, wherein the VG function generates pseudorandom samples of uncertain data values. A random database based on the VG function is specified. and multiple Monte Carlo instantiations of the random database are generated. Using a Monte Carlo method, a query is repeatedly executed over the multiple Monte Carlo instantiations to output a Monte Carlo method result and associated query-results. The Monte Carlo method result may then be used to estimate statistical properties of a probability distribution of the query-result.
-
Citations
20 Claims
-
1. A method comprising:
-
specifying data uncertainty using at least one variable generation (VG) function, wherein said VG function generates pseudorandom samples of uncertain data values; specifying a random database based on said VG function; generating a number N Monte Carlo instantiations of said random database, wherein N is a number greater than 1; identifying a database tuple bundle t, wherein the database tuple bundle t is a data structure representing N instantiations of a tuple in the N Monte Carlo instantiations; executing a query Q over the N Monte Carlo instantiations, wherein said executing comprises; executing a query plan for the query Q once over the set of all database tuple bundles; and outputting zero or more numerical values that are used to estimate statistical properties of the probability distribution of the result of the query Q. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method comprising:
-
specifying data uncertainty using at least one variable generation (VG) function, wherein said VG function generates pseudorandom samples of uncertain data values; specifying a random database based on said VG function; generating multiple Monte Carlo instantiations of said random database; using a Monte Carlo method, repeatedly executing a query over said multiple Monte Carlo instantiations to output a Monte Carlo method result and associated query-results; and using said Monte Carlo method result, estimating statistical properties of a probability distribution of said query-result. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
-
19. A system comprising:
-
a database containing uncertain data values and zero or more parameter tables; a variable generation (VG) function component that receives the results of SQL queries over said parameter tables as input and that outputs pseudorandom samples of said uncertain data values; a random database comprising said pseudorandom samples; a Monte Carlo processor generating multiple Monte Carlo instantiations of said random database; a query execution component receiving a query and executing a query over said multiple Monte Carlo instantiations to output a Monte Carlo result and associated query-results; and a statistical property estimator receiving said Monte Carlo result and estimating statistical properties of a probability distribution of said query result.
-
-
20. A computer program product for managing uncertain data, said computer program product comprising:
-
a computer usable medium having computer usable program code embodied therewith, said computer usable program code comprising; computer usable program code configured to; specify data uncertainty using at least one variable generation (VG) function, wherein said VG function generates pseudorandom samples of uncertain data values; specify a random database based on said VG function; generate multiple Monte Carlo instantiations of said random database; using a Monte Carlo method, repeatedly execute a query over said multiple Monte Carlo instantiations to output a Monte Carlo method result and associated query-results; and using said Monte Carlo method result, estimate statistical properties of a probability distribution of said query-result.
-
Specification