Automatic Vertical-Database Design
First Claim
1. A computer system configured as a database designer for specifying a layout, among a plurality of storage nodes, of a database whose logical design includes at least one table of which each is organized in rows and columns, the computer system being so configured as to:
- A) determine a physical design for the database by;
i) choosing a set of candidate sort orders; and
ii) in each of a plurality of projection-selection operations;
a) selecting a candidate sort order, from among the candidate sort orders not previously selected, in accordance with the cost of executing training queries on physical projections that have those candidate sort orders;
b) identifying the training queries for which addition of a projection having the selected sort order results in a cost-of-execution change that meets some cost-improvement criterion;
c) adding to at least one of the storage nodes of the physical design a first physical projection, thereby associated with that projection-selection operation and not similarly associated with a previous projection-selection operation, having the selected sort order and including vertically arranged versions of the logical columns that the training queries thus identified require; and
d) adding at least to one different one of the storage nodes of the physical design at least a further physical projection, whose sort order differs from that of the first physical projection, whose physical column versions are versions of the logical columns of which the first physical projection'"'"'s physical column versions are versions, such that all data in the first physical projection is replicated on more than one of the storage node; and
B) generating an output indicative of the resultant physical design.
9 Assignments
0 Petitions
Accused Products
Abstract
An automatic physical-layout designer for a database-management system determines the database'"'"'s physical layout from a set of training queries, the database'"'"'s logical design, and a parameter k that indicates how many storage nodes can be lost without losing access to any of the data. The designer lays the database out as a column store such that the stored columns constitute redundant projections on the system'"'"'s different storage nodes. It repeatedly identifies a projection, whose addition to the design will result in the greatest performance improvement for the training queries. In doing so, it takes into account the different compression formats to which the different projections lend themselves. When a projection has been identified as one to be added, it is added on one node, and k projections having the same columns are added to other nodes. The designer continues thus adding projections until a space budget has been reached.
72 Citations
33 Claims
-
1. A computer system configured as a database designer for specifying a layout, among a plurality of storage nodes, of a database whose logical design includes at least one table of which each is organized in rows and columns, the computer system being so configured as to:
-
A) determine a physical design for the database by; i) choosing a set of candidate sort orders; and ii) in each of a plurality of projection-selection operations; a) selecting a candidate sort order, from among the candidate sort orders not previously selected, in accordance with the cost of executing training queries on physical projections that have those candidate sort orders; b) identifying the training queries for which addition of a projection having the selected sort order results in a cost-of-execution change that meets some cost-improvement criterion; c) adding to at least one of the storage nodes of the physical design a first physical projection, thereby associated with that projection-selection operation and not similarly associated with a previous projection-selection operation, having the selected sort order and including vertically arranged versions of the logical columns that the training queries thus identified require; and d) adding at least to one different one of the storage nodes of the physical design at least a further physical projection, whose sort order differs from that of the first physical projection, whose physical column versions are versions of the logical columns of which the first physical projection'"'"'s physical column versions are versions, such that all data in the first physical projection is replicated on more than one of the storage node; and B) generating an output indicative of the resultant physical design. - View Dependent Claims (2, 3, 4, 5)
-
-
6. For employing a computer system to lay out among a plurality of storage nodes a database whose logical design includes at least one table of which each is organized in rows and columns, a method that comprises employing the computer system to:
-
A) determine a physical design for the database by; i) choosing a set of candidate sort orders; and ii) in each of a plurality of projection-selection operations; a) selecting a candidate sort order, from among the candidate sort orders not previously selected, in accordance with the cost of executing training queries on physical projections that have those candidate sort orders; b) identifying the training queries for which addition of a projection having the selected sort order results in a cost-of-execution change that meets some cost-improvement criterion; c) adding to at least one of the storage nodes of the physical design a first physical projection, thereby associated with that projection-selection operation and not similarly associated with a previous projection-selection operation, having the selected sort order and including vertically arranged versions of the logical columns that the training queries thus identified require; and d) adding at least to one different one of the storage nodes of the physical design at least a further physical projection, whose sort order differs from that of the first physical projection, whose physical column versions are versions of the logical columns of which the first physical projection'"'"'s physical column versions are versions, such that all data in the first physical projection is replicated on more than one of the storage node; and B) physically arrange the database'"'"'s data in at least one storage device in accordance with the design thus determined. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A storage medium containing machine instructions readable by a computer system to configure the computer system as a database designer that specifies a layout, among a plurality of storage nodes, of a database whose logical design includes at least one table of which each is organized in rows and columns, by:
-
A) determining a physical design for the database by; i) choosing a set of candidate sort orders; and ii) in each of a plurality of projection-selection operations; a) selecting a candidate sort order, from among the candidate sort orders not previously selected, in accordance with the cost of executing training queries on physical projections that have those candidate sort orders; b) identifying the training queries for which addition of a projection having the selected sort order results in a cost-of-execution change that meets some cost-improvement criterion; c) adding to at least one of the storage nodes of the physical design a first physical projection, thereby associated with that projection-selection operation and not similarly associated with a previous projection-selection operation, having the selected sort order and including vertically arranged versions of the logical columns that the training queries thus identified require; and d) adding at least to one different one of the storage nodes of the physical design at least a further physical projection, whose sort order differs from that of the first physical projection, whose physical column versions are versions of the logical columns of which the first physical projection'"'"'s physical column versions are versions, such that all data in the first physical projection is replicated on more than one of the storage node; and B) generating an output indicative of the resultant physical design. - View Dependent Claims (12, 13, 14, 15, 18)
-
-
16. A computer system configured as an automatic database designer that, for a database whose logical design includes at least one table of which each is organized in rows and columns, determines a physical design by:
-
A) making a list that includes, for each of a plurality of the database'"'"'s logical columns, a plurality of vertically arranged candidate physical versions of which at least a plurality differ at least by compression format from one another; B) for different ones of the candidate physical versions of the same logical columns, making respective cost estimates of the different costs of executing training queries on the database by using those different candidate physical versions, wherein the cost estimate depends on the candidate physical versions'"'"' compression formats; C) selecting from among the candidate physical versions in accordance with those estimates; D) including the candidate versions thus selected in a physical design; and E) generating an output indicative of the resultant physical design. - View Dependent Claims (17)
-
-
19. For managing a database whose logical design includes at least one table of which each is organized in rows and columns, a method comprising:
-
A) determining a physical design for the database by; i) making a list that includes, for each of a plurality of the database'"'"'s logical columns, a plurality of vertically arranged candidate physical versions of which at least a plurality differ at least by compression format from one another; ii) for different ones of the candidate physical versions of the same logical columns, making respective cost estimates of the different costs of executing training queries on the database by using those different candidate physical versions, wherein the cost estimates depend on those candidate physical versions'"'"' compression formats; iii) selecting from among the candidate physical versions in accordance with those cost estimates; and iv) including the candidate column versions thus selected in a physical design; and
;B) physically arranging the database'"'"'s data in at least one storage device in accordance with the design thus determined. - View Dependent Claims (20, 21)
-
-
22. A storage medium containing instructions readable by a computer system to configure the computer system as an automatic database designer that, for a database whose logical design includes at least one table of which each is organized in rows and columns, determines a physical design by:
-
A) making a list that includes, for each of a plurality of the database'"'"'s logical columns, a plurality of vertically arranged candidate physical versions of which at least a plurality differ at least by compression format from one another; B) for different ones of the candidate physical versions of the same logical columns, making respective cost estimates of the different costs of executing training queries on the database by using those different candidate physical versions, wherein the cost estimate depends on the candidate physical versions'"'"' compression formats; C) selecting from among the candidate physical versions in accordance with those estimates; D) including the candidate versions thus selected in a physical design; and E) generating an output indicative of the resultant physical design. - View Dependent Claims (23, 24)
-
-
25. A computer system configured as an automatic database designer that, for a database whose logical design includes at least one table of which each is organized in rows and columns, specifies a physical design by:
-
A) making a list that includes, for each of a plurality of the database'"'"'s logical columns, a plurality of candidate physical versions of which at least a plurality differ at least by sort order from one another; B) for different ones of the candidate physical versions of the same logical columns, making respective cost estimates of the different costs of executing training queries on the database by using those different candidate physical versions, wherein the cost estimate depends on those different candidate physical versions'"'"' sort orders; C) selecting from among the candidate physical versions in accordance with those cost estimates; D) including the candidate physical versions thus selected in a physical design; and E) generating an output indicative of the resultant physical design. - View Dependent Claims (26)
-
-
27. For employing a computer system to lay out a database whose logical design includes at least one table of which each is organized in rows and columns, a method that comprises employing the computer system to:
-
A) make a list that includes, for each of a plurality of the database'"'"'s logical columns, a plurality of candidate physical versions of which at least a plurality differ at least by sort order from one another; B) for different ones of the candidate physical versions of the same logical columns, make respective cost estimates of the different costs of executing training queries on the database by using those different candidate physical versions, wherein the cost estimate depends on those different candidate physical versions'"'"' sort orders; C) select from among the candidate physical versions in accordance with those cost estimates; D) include the candidate physical versions thus selected in a physical design; and E) physically arranging the database'"'"'s data in at least one storage device in accordance with the design thus determined. - View Dependent Claims (28)
-
-
29. A storage medium containing machine instructions readable by a computer system to configure the computer system as an automatic database designer that, for a database whose logical design includes at least one table of which each is organized in rows and columns, determines a physical design by:
-
A) making a list that includes, for each of a plurality of the database'"'"'s logical columns, a plurality of candidate physical versions of which at least a plurality differ at least by sort order from one another; B) for different ones of the candidate physical versions of the same logical columns, making respective cost estimates of the different costs of executing training queries on the database by using those different candidate physical versions, wherein the cost estimate depends on those different candidate physical versions'"'"' sort orders; C) selecting from among the candidate physical versions in accordance with those cost estimates; D) including the candidate physical versions thus selected in a physical design; and E) generating an output indicative of the resultant physical design. - View Dependent Claims (30)
-
-
31. A computer system configured as an automatic database designer that, for a database whose logical design includes at least one table of which each is organized in rows and columns, specifies a layout thereof among a plurality of storage nodes by:
-
A) estimating the cost of executing a set of training queries and thereby, choosing a plurality of different vertically arranged physical projections of the logical columns that together contain at least k+1 copies of all the database'"'"'s data, where k is a positive integer; B) assigning the physical projections thus chosen to the storage nodes in such a manner that the set of storage nodes remaining after removal of any k of the nodes contains all of the database'"'"'s data; and C) generating an output representative of a physical design in which the chosen physical projections are thus assigned.
-
-
32. For employing a computer system to specify a physical design in accordance with which to lay out among a plurality of storage nodes a database whose logical design includes at least one table of which each is organized in rows and columns, a method that comprises employing the computer system to:
-
A) by estimating the cost of executing a set of training queries, choose a plurality of different vertically arranged physical projections of the logical columns that together contain at least k+1 copies of all the database'"'"'s data, where k is a positive integer; B) assign the physical projections thus chosen to the storage nodes in such a manner that the set of storage nodes remaining after any k of the nodes is removed contains all of the database'"'"'s data; and C) generate an output representative of a physical design in which the chosen physical projections are thus assigned.
-
-
33. A storage medium containing machine instructions readable by a computer system to configure the computer system as an automatic database designer that, for a database whose logical design includes at least one table of which each is organized in rows and columns, determines a layout thereof among a plurality of storage nodes by:
-
A) estimating the cost of executing a set of training queries and thereby choosing a plurality of different vertically arranged physical projections of the logical columns that together contain at least k+1 copies of all the database'"'"'s data, where k is a positive integer; B) assigning the physical projections thus chosen to the storage nodes in such a manner that the set of storage nodes remaining after any k of the nodes is removed contains all of the database'"'"'s data; and C) generating an output representative of a physical design in which the chosen physical projections are thus assigned.
-
Specification