Ranking database query results using probabilistic models from information retrieval
First Claim
1. A method for ranking results of a present database query comprising:
- accessing a specified attribute value from the present database query;
accessing one or more unspecified attribute values from a workload, wherein the workload includes attribute values associated with one or more previous database queries including the specified attribute value;
calculating a global atomic quantity for each of the one or more unspecified attribute values in a database, each global atomic quantity representing an unconditional importance level of its respective unspecified attribute value;
calculating a conditional atomic quantity for each of the one or more unspecified attribute values in the database, each conditional atomic quantity representing a conditional importance level of an association between each of the one or more unspecified attribute values and the specified attribute value; and
ranking result tuples of the present database query based on the global atomic quantities and the conditional atomic quantities, wherein the ranking result tuples comprises;
calculating a conditional score for each tuple in the database;
calculating a global score for each tuple in the data base; and
using conditional scores and global scores, calculating a ranking score for each result tuple of the present database query that includes the specified attribute value;
wherein the calculating a conditional score comprises calculating a conditional score according to;
andwherein the calculating a global score comprises calculating a global score according to;
wherein t is a tuple that contains a specified attribute value from the present database query, x is a specified attribute value from the present database query, z is an unspecified attribute value from the present database query, W is a workload of the database, and D is data in the database.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and methods rank results of database queries. An automated approach for ranking database query results is disclosed that leverages data and workload statistics and associations. Ranking functions are based upon the principles of probabilistic models from Information Retrieval that are adapted for structured data. The ranking functions are encoded into an intermediate knowledge representation layer. The system is generic, as the ranking functions can be further customized for different applications. Benefits of the disclosed system and methods include the use of adapted probabilistic information retrieval (PIR) techniques that leverage relational/structured data, such as columns, to provide natural groupings of data values. This permits the inference and use of pair-wise associations between data values across columns, which are usually not possible with text data.
-
Citations
27 Claims
-
1. A method for ranking results of a present database query comprising:
-
accessing a specified attribute value from the present database query; accessing one or more unspecified attribute values from a workload, wherein the workload includes attribute values associated with one or more previous database queries including the specified attribute value; calculating a global atomic quantity for each of the one or more unspecified attribute values in a database, each global atomic quantity representing an unconditional importance level of its respective unspecified attribute value; calculating a conditional atomic quantity for each of the one or more unspecified attribute values in the database, each conditional atomic quantity representing a conditional importance level of an association between each of the one or more unspecified attribute values and the specified attribute value; and ranking result tuples of the present database query based on the global atomic quantities and the conditional atomic quantities, wherein the ranking result tuples comprises; calculating a conditional score for each tuple in the database; calculating a global score for each tuple in the data base; and using conditional scores and global scores, calculating a ranking score for each result tuple of the present database query that includes the specified attribute value; wherein the calculating a conditional score comprises calculating a conditional score according to; and wherein the calculating a global score comprises calculating a global score according to; wherein t is a tuple that contains a specified attribute value from the present database query, x is a specified attribute value from the present database query, z is an unspecified attribute value from the present database query, W is a workload of the database, and D is data in the database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
7. The method as recited in claim 4, further comprising:
-
maintaining the conditional list and the global list in database tables; and enabling retrieval of tuples from the database tables through indexes in the database tables.
-
-
8. The method as recited in claim 7, wherein the enabling retrieval includes enabling retrieval of tuples one-by-one in order of decreasing score and enabling retrieval of tuples by random access.
-
9. The method as recited in claim 4, wherein:
-
building a conditional list comprises creating a conditional list table that includes an attribute name column, an attribute value column, a tuple identification column, and a conditional score column; and
whereinbuilding a global list comprises creating a global list table that includes an attribute name column, an attribute value column, a tuple identification column, and a global score column.
-
-
10. One or more computer-readable storage media storing computer-executable instructions that, when executed on one or more processors, perform a method for ranking results of a present database query, the method comprising:
-
accessing a specified attribute value from the present database query; accessing one or more unspecified attribute values from a workload, wherein the workload includes attribute values associated with one or more previous database queries including the specified attribute value; calculating a global atomic quantity for each of the one or more unspecified attribute values in a database, each global atomic quantity representing an unconditional importance level of its respective unspecified attribute value; calculating a conditional atomic quantity for each of the one or more unspecified attribute values in the database, each conditional atomic quantity representing a conditional importance level of an association between each of the one or more unspecified attribute values and the specified attribute value; and ranking result tuples of the present database query based on the global atomic quantities and the conditional atomic quantities, wherein the ranking result tuples comprises; calculating a conditional score for each tuple in the database; calculating a global score for each tuple in the database; and using conditional scores and global scores, calculating a ranking score for each result tuple of the present database query that includes the specified attribute value; wherein the calculating a conditional score comprises calculating a conditional score according to; and wherein the calculating a global score comprises calculating a global score according to; wherein t is a tuple that contains a specified attribute value from the present database query, x is a specified attribute value from the present database query, z is an unspecified attribute value from the present database query, W is a workload of the database, and D is data in the database. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 22, 23, 25, 26, 27)
-
-
16. The one or more computer-readable storage media as recited in claim 13, wherein the one or more computer-readable storage media further store computer-executable instructions for:
-
maintaining the conditional list and the global list in database tables; and enabling retrieval of tuples from the database tables through indexes in the database tables.
-
-
17. The one or more computer-readable storage media as recited in claim 16, wherein the enabling retrieval includes enabling retrieval of tuples one-by-one in order of decreasing score and enabling retrieval of tuples by random access.
-
18. The one or more computer-readable storage media as recited in claim 13, wherein:
-
building a conditional list comprises creating a conditional list table that includes an attribute name column, an attribute value column, a tuple identification column, and a conditional score column; and
whereinbuilding a global list comprises creating a global list table that includes an attribute name column, an attribute value column, a tuple identification column, and a global score column.
-
-
22. The system as recited in claim 10, wherein the one or more computer-readable storage media further store computer-executable instructions for:
-
building a conditional list of all tuples in the database and ordering tuples in the conditional list by descending conditional scores; and building a global list of all tuples in the database and ordering tuples in the global list by descending global scores.
-
-
23. The system as recited in claim 22, wherein calculating a ranking score for a result tuple comprises:
-
retrieving a conditional score for the result tuple from the conditional list; retrieving a global score for the result tuple form the global list; and multiplying the global score and the conditional score.
-
-
25. The system as recited in claim 22, wherein the one or more computer-readable storage media further store computer-executable instructions for:
-
maintaining the conditional list and the global list in database tables; and enabling retrieval of tuples from the database tables through indexes in the database tables.
-
-
26. The system as recited in claim 25, wherein the enabling retrieval includes enabling retrieval of tuples one-by-one in order of decreasing score and enabling retrieval of tuples by random access.
-
27. The system as recited in claim 22, wherein:
-
building a conditional list comprises creating a conditional list table that includes an attribute name column, an attribute value column, a tuple identification column, and a conditional score column; and
whereinbuilding a global list comprises creating a global list table that includes an attribute name column, an attribute value column, a tuple identification column, and a global score column.
-
-
19. A system comprising:
-
one or more processors; and one or more computer-readable storage media storing computer-executable instructions that, when executed on the one or more processors, perform a method for ranking results of a present database query, the method comprising; accessing a specified attribute value from the present database query; accessing one or more unspecified attribute values from a workload, wherein the workload includes attribute values associated with one or more previous database queries including the specified attribute value; calculating a global atomic quantity for each of the one or more unspecified attribute values in a database, each global atomic quantity representing an unconditional importance level of its respective unspecified attribute value; calculating a conditional atomic quantity for each of the one or more unspecified attribute values in the database, each conditional atomic quantity representing a conditional importance level of an association between each of the one or more unspecified attribute values and the specified attribute value; and ranking result tuples of the present database query based on the global atomic quantities and the conditional atomic quantities, wherein the ranking result tuples comprises; calculating a conditional score for each tuple in the database; calculating a global score for each tuple in the database; and using conditional scores and global scores, calculating a ranking score for each result tuple of the present database query that includes the specified attribute value; wherein the calculating a conditional score comprises calculating a conditional score according to; and wherein the calculating a global score comprises calculating a global score according to; wherein t is a tuple that contains a specified attribute value from the present database query, x is a specified attribute value from the present database query, z is an unspecified attribute value from the present database query, W is a workload of the database, and D is data in the database. - View Dependent Claims (20, 21, 24)
-
Specification