Ranking database query results

US 20050289102A1
Filed: 06/29/2004
Published: 12/29/2005
Est. Priority Date: 06/29/2004
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

calculating a global atomic quantity for each attribute value in a database, each global atomic quantity representing an unconditional importance level of its respective attribute value;

calculating a conditional atomic quantity for each attribute value in the database, each conditional atomic quantity representing a conditional importance level of an association between a pair of attribute values; and

ranking result tuples of a database query based on global atomic quantities and conditional atomic quantities.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and methods rank results of database queries. An automated approach for ranking database query results is disclosed that leverages data and workload statistics and associations. Ranking functions are based upon the principles of probabilistic models from Information Retrieval that are adapted for structured data. The ranking functions are encoded into an intermediate knowledge representation layer. The system is generic, as the ranking functions can be further customized for different applications. Benefits of the disclosed system and methods include the use of adapted probabilistic information retrieval (PIR) techniques that leverage relational/structured data, such as columns, to provide natural groupings of data values. This permits the inference and use of pair-wise associations between data values across columns, which are usually not possible with text data.

57 Citations

View as Search Results

28 Claims

1. A method comprising:
- calculating a global atomic quantity for each attribute value in a database, each global atomic quantity representing an unconditional importance level of its respective attribute value;
  
  calculating a conditional atomic quantity for each attribute value in the database, each conditional atomic quantity representing a conditional importance level of an association between a pair of attribute values; and
  
  ranking result tuples of a database query based on global atomic quantities and conditional atomic quantities.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. A method as recited in claim 1, wherein the calculating a global atomic quantity is selected from the group comprising:
    - calculating a database global atomic quantity that represents a frequency of occurrence of an attribute value within the database; and
      
      calculating a workload global atomic quantity that represents a frequency of occurrence of an attribute value within a workload.
  - 3. A method as recited in claim 1, wherein the calculating a conditional atomic quantity is selected from the group comprising:
    - calculating a database conditional atomic quantity that represents a conditional importance level of an association between a pair of attribute values within the database; and
      
      calculating a workload conditional atomic quantity that represents a conditional importance level of an association between a pair of attribute values within a workload.
  - 4. A method as recited in claim 1, wherein the ranking result tuples comprises:
    - calculating a conditional score for each tuple in the database;
      
      calculating a global score for each tuple in the database; and
      
      using conditional scores and global scores, calculating a ranking score for each result tuple of a database query that includes an attribute value specified in the database query.
  - 5. A method as recited in claim 4, wherein the calculating a conditional score comprises calculating a conditional score according to:
  - 6. A method as recited in claim 4, wherein the calculating a global score comprises calculating a global score according to:
  - 7. A method as recited in claim 4, further comprising:
    - building a conditional list of all tuples in the database and ordering tuples in the conditional list by descending conditional scores; and
      
      building a global list of all tuples in the database and ordering tuples in the global list by descending global scores.
  - 8. A method as recited in claim 7, wherein calculating a ranking score for a result tuple comprises:
    - retrieving a conditional score for the result tuple from the conditional list;
      
      retrieving a global score for the result tuple form the global list; and
      
      multiplying the global score and the conditional score.
  - 9. A method as recited in claim 4, wherein the calculating a ranking score comprises calculating a ranking score according to:
  - 10. A method as recited in claim 7, further comprising:
    - maintaining the conditional list and the global list in database tables;
      
      enabling retrieval of tuples from the database tables through indexes in the database tables.
  - 11. A method as recited in claim 10, wherein the enabling retrieval includes enabling retrieval of tuples one-by-one in order of decreasing score and enabling retrieval of tuples by random access.
  - 12. A method as recited in claim 7, wherein:
    - building a conditional list comprises creating a conditional list table that includes an attribute name column, an attribute value column, a tuple identification column, and a conditional score column; and
      
      wherein building a global list comprises creating a global list table that includes an attribute name column, an attribute value column, a tuple identification column, and a global score column.

13. A processor-readable medium comprising processor-executable instructions configured for:
- computing atomic probabilities of attribute values in a database, the atomic probabilities computed according to p(y|W), p(y|D), p(x|y, w) and p(x|y,D), wherein x is a specified attribute value, y is an unspecified attribute value, W is a workload of the database, and D is data in the database; and
  
  storing the atomic probabilities as atomic probabilities tables in an intermediate knowledge representation layer of the database.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. A processor-readable medium as recited in claim 13, comprising further processor-executable instructions configured for:
    - creating a conditional list of tuple-ids ordered by descending conditional scores, the conditional scores calculated based on the atomic probabilities;
      
      creating a global list of tuple-ids ordered by descending global scores, the global scores based on the atomic probabilities; and
      
      storing the lists as global and conditional list tables in the database.
  - 15. A processor-readable medium as recited in claim 14, wherein the storing comprises storing the conditional list in a table having columns that include:
    - an attribute name column;
      
      an attribute value column;
      
      a tuple identification column; and
      
      a conditional score column.
  - 16. A processor-readable medium as recited in claim 14, wherein the storing comprises storing the global list in a table having columns that include:
    - an attribute name column;
      
      an attribute value column;
      
      a tuple identification column; and
      
      a global score column.
  - 17. A processor-readable medium as recited in claim 14, comprising further processor-executable instructions configured for:
    - receiving a query;
      
      randomly accessing the conditional and global lists to retrieve, respectively, conditional and global scores of tuple-ids for result tuples that satisfy the query;
      
      for each tuple-id that satisfies the query, multiplying scores to determine a final score for the tuple-id; and
      
      ranking the result tuples according to the final scores of their respective tuple-ids.
  - 18. A processor-readable medium as recited in claim 13, comprising further processor-executable instructions configured for:
    - receiving a query;
      
      scanning query result tuples; and
      
      computing a score for each query result tuple based on atomic probabilities stored in the intermediate knowledge representation layer.
  - 19. A computer comprising the processor-readable medium of claim 13.

20. A computer comprising:
- a pre-processing component configured to compute a global atomic probability and a conditional atomic probability for each attribute value in a database; and
  
  a query processing component configured to analyze a database query and to rank result tuples of the query based on tuple scores calculated from the atomic probabilities.
- View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28)
- - 21. A computer as recited in claim 20, wherein the pre-processing component comprises:
    - an atomic probabilities module configured to compute the global atomic probabilities and the conditional atomic probabilities based on data and workload information in the database; and
      
      an index module configured to compute a conditional score and a global score for each tuple in the database based on the global atomic probabilities and the conditional atomic probabilities, build a conditional list of tuples ordered by descending conditional scores, and build a global list of tuples ordered by descending global scores.
  - 22. A computer as recited in claim 21, wherein the query processing component comprises a list merge algorithm configured to retrieve conditional scores and global scores for query result tuples, multiply the scores to determine a final score for each query result tuple, and rank k query result tuples according to their respective final scores, wherein k is an integer variable and query result tuples contain a query-specified attribute value.
  - 23. A computer as recited in claim 21, wherein the query processing component comprises a scan algorithm configured to scan tuples, compute a score for each tuple that contains a query-specified attribute, and return k highest-score tuples, where k is an integer variable.
  - 24. A computer as recited in claim 21, further comprising an intermediate knowledge representation layer, the atomic probabilities module further configured to store the atomic probabilities in the intermediate knowledge representation layer as atomic probabilities database tables.
  - 25. A computer as recited in claim 21, further comprising the database.
  - 26. A computer as recited in claim 25, wherein the database comprises:
    - an intermediate knowledge representation layer configured to store the atomic probabilities;
      
      a conditional list table configured to store conditional lists; and
      
      a global list table configured to store global lists.
  - 27. A computer as recited in claim 26, wherein the conditional list table comprises:
    - an attribute name column;
      
      an attribute value column;
      
      a tuple identification column; and
      
      a conditional score column.
  - 28. A computer as recited in claim 26, wherein the global list table comprises:
    - an attribute name column;
      
      an attribute value column;
      
      a tuple identification column; and
      
      a global score column.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Chaudhuri, Surajit, Das, Gautam, Weikum, Gerhard, Hristidis, Vagelis

Granted Patent

US 7,383,262 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06Q 30/0603 Catalogue ordering

G06Q 50/16 Real estate

Ranking database query results

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

57 Citations

28 Claims

Specification

Use Cases

Quick Links

Others

Ranking database query results

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

57 Citations

28 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others