×

Method and apparatus for predicting selectivity of database query join conditions using hypothetical query predicates having skewed value constants

  • US 7,987,200 B2
  • Filed: 10/31/2007
  • Issued: 07/26/2011
  • Est. Priority Date: 11/18/2004
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for predicting the selectivity of a set of at least one logical query condition for querying a database, said at least one logical query condition comprising at least one join condition for joining multiple tables of said database, comprising the steps of:

  • automatically identifying at least one skewed value for a first field of said database specified in said at least one join condition;

    for each skewed value identified by said step of automatically identifying at least one skewed value, automatically constructing a corresponding set of one or more hypothetical query predicates, each hypothetical query predicate of said set of hypothetical query predicates corresponding to a respective query condition of said set of at least one logical query condition, wherein each respective said hypothetical query predicate is constructed by replacing each occurrence of said first field in the corresponding query condition with a constant equal to the skewed value corresponding to the set of hypothetical query predicates containing the hypothetical query predicate;

    automatically predicting a respective selectivity for each said set of hypothetical query predicates; and

    automatically predicting a composite selectivity for said set of at least one logical query condition using the respective selectivity for each said set of hypothetical query predicates;

    automatically identifying at least one skewed value for a first field comprises automatically accessing a frequent value list containing sampled values from said first field, and automatically identifying any sampled values exceeding at least one pre-determined threshold;

    automatically identifying any sampled values exceeding at least one pre-determined threshold comprises automatically identifying any sampled values exceeding at least two pre-determined thresholds, a first of said pre-determined thresholds representing a number of occurrences of said sampled value, and a second of said pre-determined thresholds representing a proportion of records containing said sampled value among all records in a database table containing said first field.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×