Methods and apparatus for ranking uncertain data in a probabilistic database
First Claim
1. A method implemented by a database server to rank non-deterministic data stored in a database, the method comprising:
- storing a set of data tuples representing a plurality of possible instantiations of the non-deterministic data in a database memory, a first data tuple in the set of data tuples taking on a first one of a set of possible data tuple instantiations selectable according to a first probability representing a likelihood of occurrence of the first one of the set of possible data tuple instantiations, respective ones of the possible instantiations of the non-deterministic data being realized when the database server selects a respective combination of possible data tuple instantiations for a combination of data tuples in the set of data tuples, the respective ones of the possible instantiations of the non-deterministic data being associated with respective instantiation probabilities;
in response to a query requesting a first number of data tuples collectively exhibiting a highest ranking among the set of data tuples, determining, using a processor, a ranking of the set of data tuples for output by the database server by determining an expected rank for the first data tuple, the expected rank for the first data tuple corresponding to a combination of multiple weighted component ranks for the first data tuple across the plurality of possible instantiations of the non-deterministic data, respective ones of the weighted component ranks for the first data tuple comprising a component rank, which represents a respective rank of the first data tuple in a respective one of the possible instantiations of the non-deterministic data, weighted by the respective instantiation probability for the respective one of the possible instantiations of the non-deterministic data; and
outputting a first ranked subset of the data tuples containing the first number of data tuples determined to exhibit the highest ranking among the set of data tuples, wherein the first ranked subset of the data tuples exhibits all of an exactness property, a containment property, a unique ranking property, a value invariance property and a stability property.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods and apparatus for ranking uncertain data in a probabilistic database are disclosed. An example method disclosed herein comprises using a set of data tuples representing a plurality of possible data set instantiations associated with a respective plurality of instantiation probabilities to store non-deterministic data in a database, each data tuple corresponding to a set of possible data tuple instantiations, each data set instantiation realizable by selecting a respective data tuple instantiation for at least some of the data tuples, the method further comprising determining an expected rank for each data tuple included in at least a subset of the set of data tuples, the expected rank for a particular data tuple representing a combination of weighted component ranks of the particular data tuple, each component rank representing a ranking of the data tuple in a corresponding data set instantiation, each component ranking weighted by a respective instantiation probability.
41 Citations
19 Claims
-
1. A method implemented by a database server to rank non-deterministic data stored in a database, the method comprising:
-
storing a set of data tuples representing a plurality of possible instantiations of the non-deterministic data in a database memory, a first data tuple in the set of data tuples taking on a first one of a set of possible data tuple instantiations selectable according to a first probability representing a likelihood of occurrence of the first one of the set of possible data tuple instantiations, respective ones of the possible instantiations of the non-deterministic data being realized when the database server selects a respective combination of possible data tuple instantiations for a combination of data tuples in the set of data tuples, the respective ones of the possible instantiations of the non-deterministic data being associated with respective instantiation probabilities; in response to a query requesting a first number of data tuples collectively exhibiting a highest ranking among the set of data tuples, determining, using a processor, a ranking of the set of data tuples for output by the database server by determining an expected rank for the first data tuple, the expected rank for the first data tuple corresponding to a combination of multiple weighted component ranks for the first data tuple across the plurality of possible instantiations of the non-deterministic data, respective ones of the weighted component ranks for the first data tuple comprising a component rank, which represents a respective rank of the first data tuple in a respective one of the possible instantiations of the non-deterministic data, weighted by the respective instantiation probability for the respective one of the possible instantiations of the non-deterministic data; and outputting a first ranked subset of the data tuples containing the first number of data tuples determined to exhibit the highest ranking among the set of data tuples, wherein the first ranked subset of the data tuples exhibits all of an exactness property, a containment property, a unique ranking property, a value invariance property and a stability property. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A tangible computer readable memory comprising computer-readable instructions, which, when executed, cause a database server to perform operations comprising:
-
storing a set of data tuples representing a plurality of possible instantiations of the non-deterministic data in a database memory, a first data tuple in the set of data tuples taking on a first one of a set of possible data tuple instantiations selectable according to a first probability representing a likelihood of occurrence of the first one of the set of possible data tuple instantiations, respective ones of the possible instantiations of the non-deterministic data being realized upon selection of a respective combination of possible data tuple instantiations for a combination of data tuples in the set of data tuples, the respective ones of the possible instantiations of the non-deterministic data being associated with respective instantiation probabilities; determining a ranking of the set of data tuples for output by the database server in response to a query requesting a first number of data tuples collectively exhibiting a highest ranking among the set of data tuples, wherein the determining the ranking comprises determining an expected rank for the first data tuple, the expected rank for the first data tuple corresponding to a combination of multiple weighted component ranks for the first data tuple across the plurality of possible instantiations of the non-deterministic data, respective ones of the weighted component ranks for the first data tuple comprising a component rank, which represents a respective rank of the first data tuple in a respective one of the possible instantiations of the non-deterministic data, weighted by the respective instantiation probability for the respective one of the possible instantiations of the non-deterministic data; and outputting a first ranked subset of the data tuples containing the first number of data tuples determined to exhibit the highest ranking among the set of data tuples, wherein the first ranked subset of the data tuples exhibits all of an exactness property, a containment property, a unique ranking property, a value invariance property and a stability property. - View Dependent Claims (13, 14, 15)
-
-
16. A database server for use in ranking non-deterministic data, the database server comprising:
-
a memory having machine readable instructions stored thereon; a probabilistic database to store a set of data tuples representing a plurality of possible instantiations of the non-deterministic data, a first data tuple in the set of data tuples taking on a first one of a set of possible data tuple instantiations selectable according to a first probability representing a likelihood of occurrence of the first one of the set of possible data tuple instantiations, respective ones of the possible instantiations of the non-deterministic data being realized upon selection of a respective combination of possible data tuple instantiations for a combination of data tuples in the set of data tuples, the respective ones of the possible instantiations of the non-deterministic data being associated with respective instantiation probabilities; and an expected ranking processor to execute the machine readable instructions to perform operations comprising; determining a ranking of the set of data tuples for output by the database server in response to a query requesting a first number of data tuples collectively exhibiting a highest ranking among the set of data tuples, wherein the determining the ranking comprises determining an expected rank for the first data tuple, the expected rank for the first data tuple corresponding to a combination of multiple weighted component ranks for the first data tuple across the plurality of possible instantiations of the non-deterministic data, respective ones of the weighted component ranks for the first data tuple comprising a component rank, which represents a respective rank of the first data tuple in a respective one of the possible instantiations of the non-deterministic data, weighted by the respective instantiation probability for the respective one of the possible instantiations of the non-deterministic data; and outputting a first ranked subset of the data tuples containing the first number of data tuples determined to exhibit the highest ranking among the set of data tuples, wherein the first ranked subset of the data tuples exhibits all of an exactness property, a containment property, a unique ranking property, a value invariance property and a stability property. - View Dependent Claims (17, 18, 19)
-
Specification