System and method for reducing compile time in a top down rule based system using rule heuristics based upon the predicted resulting data flow
First Claim
1. A computer based method for reducing a number of expressions to which rules are applied in a top down rule based system for query optimization, comprising the steps of:
- representing said query as a group of expressions;
identifying a first logical expression in a first group of said group of expressions; and
optimizing said first logical expression, including the steps of;
identifying a set of applicable rules;
identifying a set of promising rules as those rules from said set of applicable rules that match said first logical expression including the step of;
performing a first pruning heuristic based upon data flow if a first rule of said set of applicable rules satisfies said first pruning heuristic, said first heuristic identifying one or more eliminate rules; and
applying a first rule to said first logical expression if said first rule is not identified as said eliminate rule.
4 Assignments
0 Petitions
Accused Products
Abstract
The present invention applies one or more pruning heuristics to the expression, the binding, and/or the substitute during a database query optimization process. The heuristics identify certain rules that can be eliminated by either not applying the rules and/or not implementing the rules for a given expression and context (if any) based upon one or more flow rates of the expression. The pruning heuristics can eliminate the application of rules based upon the flow rates of the binding or substitute, for example. Examples include (1) not applying (cutting) a MergeJoin rule for a join expression when an inner table is small enough to be stored in a memory space that is allocated for a HashJoin; (2) not applying implementation rules on the substitute of a left-shift rule for an expression if the resulting input data flow rate from the left child of the join is significantly larger in the substitute than in the binding; (3) not applying the join to TSJ (tuple substitute join) rule if the data flow output of the join expression is significantly larger than the data flow input from the inner child of the join expression; or (4) not applying implementation rules on the substitute join expression of a left shift rule if the number of cross products increases and if the data flow rate from the left child is increases.
-
Citations
61 Claims
-
1. A computer based method for reducing a number of expressions to which rules are applied in a top down rule based system for query optimization, comprising the steps of:
-
representing said query as a group of expressions;
identifying a first logical expression in a first group of said group of expressions; and
optimizing said first logical expression, including the steps of;
identifying a set of applicable rules;
identifying a set of promising rules as those rules from said set of applicable rules that match said first logical expression including the step of;
performing a first pruning heuristic based upon data flow if a first rule of said set of applicable rules satisfies said first pruning heuristic, said first heuristic identifying one or more eliminate rules; and
applying a first rule to said first logical expression if said first rule is not identified as said eliminate rule. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
identifying said eliminate rule if said first rule satisfies said first pruning heuristic, said eliminate rule being one of said first rule and a second rule of said set of applicable rules, said eliminate rule being a rule that, when eliminated, reduces a query search space without significantly reducing a probability of selecting an optimal solution to said query.
-
-
4. The method of claim 3, wherein said eliminate rule is not applied to said first logical expression for a first context.
-
5. The method of claim 3, wherein said first pruning heuristic is based upon one or more flow rates of said first logical expression.
-
6. The method of claim 3, wherein said step of identifying said eliminate rule includes the step of:
identifying a MergeJoin rule as said eliminate rule if a first MergeJoin input can be stored in a first portion of memory and a sorting operation is not required.
-
7. The method of claim 6, wherein said first portion of memory is local memory that is allocated for a HashJoin rule.
-
8. The method of claim 6, wherein said first MergeJoin input is an inner input.
-
9. The method of claim 6, wherein said MergeJoin rule is said first rule.
-
10. The method of claim 6, wherein a HashJoin rule is said first rule.
-
11. The method of claim 3, wherein said step of identifying said eliminate rule includes the step of:
identifying all implementation rules for a substitute expression of a left-shift rule as said eliminate rules if a first data flow rate input received by a first parent join of said first logical expression is significantly less than a second data flow rate input received by said first parent join of a substitute expression of said left-shift rule.
-
12. The method of claim 11, wherein said substitute expression is an expression generated by applying said left-shift rule to said first logical expression, said left-shift rule being said first rule.
-
13. The method of claim 11, wherein said substitute expression is not implemented.
-
14. The method of claim 3, wherein said step of identifying said eliminate rule includes the step of:
identifying all implementation rules for a substitute expression of a left-shift rule as said eliminate rules for a substitute expression of said left-shift rule, if a first data flow rate input received by a first parent join of said first logical expression is significantly greater than a second data flow rate input received by said first parent join of said substitute expression of said left-shift rule.
-
15. The method of claim 14, wherein said substitute expression is an expression generated by applying said left-shift rule to said first logical expression, said left-shift rule being said first rule.
-
16. The method of claim 14, wherein said first logical expression is not implemented.
-
17. The method of claim 3, wherein said step of identifying said eliminate rule includes the step of:
identifying a tuple substitute join rule as said eliminate rule if a first output data flow rate from a first parent join exceeds a first input data flow rate received by said first parent join.
-
18. The method of claim 3, wherein said step of identifying said eliminate rule includes the step of:
identifying all implementation rules for a substitute expression of a left-shift rule as said eliminate rules for a substitute expression of said left-shift rule, if said substitute expression includes more cross-products than said first logical expression.
-
19. The method of claim 18, wherein said substitute expression is an expression generated by applying said left-shift rule to said first logical expression, said left-shift rule being said first rule.
-
20. The method of claim 18, wherein said first logical expression is not implemented.
-
21. The method of claim 3, wherein said step of identifying said eliminate rule includes the step of:
identifying all implementation rules for a substitute expression of a left-shift rule as said eliminate rule for said first logical expression of said left-shift rule, if said substitute expression includes less cross-products than said first logical expression.
-
22. The method of claim 21, wherein said substitute expression is an expression generated by applying said left-shift rule to said first logical expression, said left-shift rule being said first rule.
-
23. The method of claim 21, wherein said left-shift rule is not implemented for said substitute expression.
-
24. The method of claim 3, wherein said step of identifying said eliminate rule includes the step of:
identifying all implementation rules for a substitute expression of a left-shift rule as said eliminate rule for a substitute expression of said left-shift rule, if said substitute expression includes a same number cross-products as said first logical expression, and if one of said cross products is positioned at a higher node in said first logical expression than a position of a node in said substitute expression, and if a first data flow rate received by a parent join of said first logical expression is less than a second data flow rate received by a parent join of said substitute expression.
-
25. The method of claim 24, wherein said substitute expression is an expression generated by applying said left-shift rule to said first logical expression, said left-shift rule being said first rule.
-
26. The method of claim 24, wherein said first logical expression is not implemented.
-
27. The method of claim 3, wherein said step of identifying said eliminate rule includes the step of:
identifying all implementation rules for a substitute expression of a left-shift rule as said eliminate rule for said first logical expression, if a substitute expression of said left-shift rule includes a same number cross-products as said first logical expression, and if said cross product is positioned at a lower node in said first logical expression than a position of a node in said substitute expression, and if a first data flow rate received by a parent join of said first logical expression is greater than a second data flow rate received by a parent join of said substitute expression.
-
28. The method of claim 27, wherein said substitute expression is an expression generated by applying said left-shift rule to said first logical expression, said left-shift rule being said first rule.
-
29. The method of claim 27, wherein said substitute expression is not implemented.
-
30. The method of claim 1, wherein said step of apply a first rule includes one of applying said first rule and implementing said first rule.
-
31. A computer program embodied in a tangible medium and capable of being read by a computer, for performing the method of claim 1.
-
32. A system for reducing the number of expressions to which rules are applied in a top down rule based system for query optimization comprising:
-
a processor;
a storage device, disposed to receive signals from said processor;
query representation means, located in said storage device, for representing said query as a group of expressions;
first identifying means for identifying a first logical expression in a first group of said group of expressions; and
first optimizing means for optimizing said first logical expression, including;
second identifying means for identifying a set of applicable rules;
third identifying means for identifying a set of promising rules as those rules from said set of applicable rules that match said first logical expression including;
first heuristic means for performing a first pruning heuristic based upon data flow if a first rule of said set of applicable rules satisfies said first pruning heuristic, said first heuristic identifying one or more eliminate rules; and
rule application means for applying a first rule to said first logical expression if said first rule is not identified as an eliminate rule. - View Dependent Claims (33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61)
fourth identifying means for identifying said eliminate rule if said first rule satisfies said first pruning heuristic, said eliminate rule being one of said first rule and a second rule of said set of applicable rules, said eliminate rule being a rule that, when eliminated, reduces a query search space without significantly reducing a probability of selecting an optimal solution to said query.
-
-
35. The system of claim 34, wherein said eliminate rule is not applied to said first logical expression for a first context.
-
36. The system of claim 34, wherein said first pruning heuristic is based upon one or more flow rates of said first logical expression.
-
37. The system of claim 34, wherein said fourth identifying means includes:
fifth identifying means for identifying a MergeJoin rule as said eliminate rule if a first MergeJoin input can be stored in a first portion of memory in said storage device and a sorting operation is not required.
-
38. The system of claim 37, wherein said first portion of memory is local memory that is allocated for a HashJoin rule.
-
39. The system of claim 37, wherein said first MergeJoin input is an inner input.
-
40. The system of claim 37, wherein said MergeJoin rule is said first rule.
-
41. The system of claim 37, wherein a HashJoin rule is said first rule.
-
42. The system of claim 34, wherein said fourth identifying means includes:
fifth identifying means for identifying all implementation rules for a substitute expression of a left-shift rule as said eliminate rules if a first data flow rate input received by a first parent join of said first logical expression is significantly less than a second data flow rate input received by said first parent join of a substitute expression of said left-shift rule.
-
43. The system of claim 42, wherein said substitute expression is an expression generated by applying said left-shift rule to said first logical expression, said left-shift rule being said first rule.
-
44. The system of claim 42, wherein said substitute expression is not implemented.
-
45. The system of claim 34, wherein said fourth identifying means includes:
fifth identifying means for identifying all implementation rules for a substitute expression of a left-shift rule as said eliminate rules for a substitute expression of said left-shift rule, if a first data flow rate input received by a first parent join of said first logical expression is significantly greater than a second data flow rate input received by said first parent join of said substitute expression of said left-shift rule.
-
46. The system of claim 45, wherein said substitute expression is an expression generated by applying said left-shift rule to said first logical expression, said left-shift rule being said first rule.
-
47. The system of claim 45, wherein said first logical expression is not implemented.
-
48. The system of claim 34, wherein said fourth identifying means includes:
fifth identifying means for identifying a tuple substitute join rule as said eliminate rule if a first output data flow rate from a first parent join exceeds a first input data flow rate received by said first parent join.
-
49. The system of claim 34, wherein said fourth identifying means includes:
fifth identifying means for identifying all implementation rules for a substitute expression of a left-shift rule as said eliminate rules for a substitute expression of said left-shift rule, if said substitute expression includes more cross-products than said first logical expression.
-
50. The system of claim 49, wherein said substitute expression is an expression generated by applying said left-shift rule to said first logical expression, said left-shift rule being said first rule.
-
51. The system of claim 49, wherein said left-shift rule is not implemented for said first logical expression.
-
52. The system of claim 34, wherein said fourth identifying means includes:
fifth identifying means for identifying all implementation rules for a substitute expression of a left-shift rule as said eliminate rules for said first logical expression of said left-shift rule, if said substitute expression includes less cross-products than said first logical expression.
-
53. The system of claim 52, wherein said substitute expression is an expression generated by applying said left-shift rule to said first logical expression, said left-shift rule being said first rule.
-
54. The system of claim 52, wherein said substitute expression is not implemented.
-
55. The system of claim 34, wherein said fourth identifying means includes:
fifth identifying means for identifying all implementation rules for a substitute expression of a left-shift rule as said eliminate miles for a substitute expression of said left-shift rule, if said substitute expression includes a same number cross-products as said first logical expression, and if said cross product is positioned at a higher node in said first logical expression than a position of a node in said substitute expression, and if a first data flow rate received by a parent join of said first logical expression is less than a second data flow rate received by a parent join of said substitute expression.
-
56. The system of claim 55, wherein said substitute expression is an expression generated by applying said left-shift rule to said first logical expression, said left-shift rule being said first rule.
-
57. The system of claim 55, wherein said first logical expression is not implemented.
-
58. The system of claim 34, wherein said fourth identifying means includes:
fifth identifying means for identifying all implementation rules for a substitute expression of a left-shift rule as said eliminate rules for said first logical expression, if a substitute expression of said left-shift rule includes a same number cross-products as said first logical expression, and if said cross product is positioned at a lower node in said first logical expression than a position of a node in said substitute expression, and if a first data flow rate received by a parent join of said first logical expression is greater than a second data flow rate received by a parent join of said substitute expression.
-
59. The system of claim 58, wherein said substitute expression is an expression generated by applying said left-shift rule to said first logical expression, said left-shift rule being said first rule.
-
60. The system of claim 58, wherein said substitute expression is not implemented.
-
61. The system of claim 32, wherein said rule application means performs one of applying said first rule and implementing said first rule.
Specification