Cost-based optimizer for an XML data repository within a database
First Claim
1. A method comprising the computer-implemented steps of:
- gathering statistics by a database server about nodes that are stored in a database repository that is managed by the database server;
wherein said nodes form a hierarchy;
wherein each node is either an XML file or an XML file container;
wherein at least one node in the hierarchy is an XML file container that contains a plurality of XML files, each of which contains a plurality of XML elements;
storing said statistics; and
in response to a request to the database server for access to one or more XML resources from said database repository, the database server computing a computational cost associated with each of two or more methods of accessing said one or more XML resources from said database repository, based on said statistics;
wherein the two or more methods of accessing said one or more XML resources from said database repository include accessing said one or more XML resources through an index and accessing said one or more XML resources without using the index;
wherein the method is performed by one or more computing devices;
wherein XML files of said nodes are XML resources, and wherein the step of computing a computational cost comprises (a) computing a selectivity value for each of one or more predicates, from said request, that contain operators on said database repository and (b) computing a computational cost of traversing, to locate a particular XML resource specified in said request, an index in which said XML resources are indexed for accessing said database repository.
1 Assignment
0 Petitions
Accused Products
Abstract
Cost-based optimizer functionality for an XML database repository provides means for optimizing the execution of database queries that access XML resources in the database repository. Statistics about XML resources that are stored in the database repository are gathered, stored and utilized by a query optimizer to compute computational costs associated with each of multiple methods of accessing particular XML resources requested in a database query. Hence, the optimizer is able to select the most efficient query execution plan based on the costs of possible access paths. In one embodiment, specific statistics about the hierarchical structure of XML resources stored in the XML database repository are gathered, stored in a relational table in the database management system, and used to compute the selectivity of query predicates and the index cost associated with traversing one or more indexes to access requested XML resources.
319 Citations
43 Claims
-
1. A method comprising the computer-implemented steps of:
-
gathering statistics by a database server about nodes that are stored in a database repository that is managed by the database server; wherein said nodes form a hierarchy; wherein each node is either an XML file or an XML file container; wherein at least one node in the hierarchy is an XML file container that contains a plurality of XML files, each of which contains a plurality of XML elements; storing said statistics; and in response to a request to the database server for access to one or more XML resources from said database repository, the database server computing a computational cost associated with each of two or more methods of accessing said one or more XML resources from said database repository, based on said statistics; wherein the two or more methods of accessing said one or more XML resources from said database repository include accessing said one or more XML resources through an index and accessing said one or more XML resources without using the index; wherein the method is performed by one or more computing devices; wherein XML files of said nodes are XML resources, and wherein the step of computing a computational cost comprises (a) computing a selectivity value for each of one or more predicates, from said request, that contain operators on said database repository and (b) computing a computational cost of traversing, to locate a particular XML resource specified in said request, an index in which said XML resources are indexed for accessing said database repository. - View Dependent Claims (2, 3, 4, 5, 6, 9, 10, 11, 12, 13, 32, 33, 34, 35, 36, 37, 38, 39)
-
-
7. A method comprising the computer-implemented steps of:
-
gathering statistics by a database server about nodes that are stored in a database repository that is managed by the database server; wherein said nodes form a hierarchy; wherein each node is either an XML file or an XML file container; wherein at least one node in the hierarchy is an XML file container that contains a plurality of XML files, each of which contains a plurality of XML elements; storing said statistics; and in response to a single request to the database server for access to one or more XML resources from said database repository, before the database server executes said single request, the database server computing a computational cost associated with each of two or more methods of accessing said one or more XML resources from said database repository, based on said statistics; wherein the method is performed by one or more computing devices; wherein the step of computing a computational cost comprises computing a selectivity value for each of one or more predicates, from said request, that contain operators on said database repository; and wherein XML files of said nodes are XML resources, and wherein each of said XML resources is stored, in association with a location of a node in said hierarchy, in a column of a table in said database repository, and wherein an operator contained in at least one of said one or more predicates is an operator that determines whether a particular XML resource can be located in said database repository through a particular specified path, specified in said at least one predicate, through a portion of said hierarchy.
-
-
8. A method comprising the computer-implemented steps of:
-
gathering statistics by a database server about nodes that are stored in a database repository that is managed by the database server; wherein said nodes form a hierarchy; wherein each node is either an XML file or an XML file container; wherein at least one node in the hierarchy is an XML file container that contains a plurality of XML files, each of which contains a plurality of XML elements; storing said statistics; and in response to a single request to the database server for access to one or more XML resources from said database repository, before the database server executes said single request, the database server computing a computational cost associated with each of two or more methods of accessing said one or more XML resources from said database repository, based on said statistics; wherein the method is performed by one or more computing devices; wherein the step of computing a computational cost comprises computing a selectivity value for each of one or more predicates, from said request, that contain operators on said database repository; and wherein XML files of said nodes are XML resources, and wherein each of said XML resources is stored, in association with a location of a node in said hierarchy, in a column of a table in said database repository, and wherein an operator contained in at least one of said one or more predicates is an operator that determines whether a particular XML resource can be located in said database repository at a terminal location of a particular specified path, specified in said at least one predicate, through a portion of said hierarchy.
-
-
14. A computer-readable storage medium that stores instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of:
-
gathering statistics by a database server about nodes that are stored in a database repository that is managed by the database server; wherein said nodes form a hierarchy; wherein each node is either an XML file or an XML file container; wherein at least one node in the hierarchy is an XML file container that contains a plurality of XML files, each of which contains a plurality of XML elements; storing said statistics; and in response to a request to the database server for access to one or more XML resources from said database repository, the database server computing a computational cost associated with each of two or more methods of accessing said one or more XML resources from said database repository, based on said statistics; wherein the two or more methods of accessing said one or more XML resources from said database repository include accessing said one or more XML resources through an index and accessing said one or more XML resources without using the index; wherein XML files of said nodes are XML resources, and wherein the step of computing a computational cost comprises (a) computing a selectivity value for each of one or more predicates, from said request, that contain operators on said database repository and (b) computing a computational cost of traversing, to locate a particular XML resource specified in said request, an index in which said XML resources are indexed for accessing said database repository. - View Dependent Claims (15, 16, 17, 18, 19, 22, 23, 24, 25, 42)
-
-
20. A computer-readable storage medium that stores instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of:
-
gathering statistics by a database server about nodes that are stored in a database repository that is managed by the database server; wherein said nodes form a hierarchy; wherein each node is either an XML file or an XML file container; wherein at least one node in the hierarchy is an XML file container that contains a plurality of XML files, each of which contains a plurality of XML elements; storing said statistics; and in response to a single request to the database server for access to one or more XML resources from said database repository, before the database server executes said single request, the database server computing a computational cost associated with each of two or more methods of accessing said one or more XML resources from said database repository, based on said statistics; wherein the step of computing a computational cost comprises computing a selectivity value for each of one or more predicates, from said request, that contain operators on said database repository; and wherein XML files of said nodes are XML resources, and wherein each of said XML resources is stored, in association with a location of a node in said hierarchy, in a column of a table in said database repository, and wherein an operator contained in at least one of said one or more predicates is an operator that determines whether a particular XML resource can be located in said database repository through a particular specified path, specified in said at least one predicate, through a portion of said hierarchy.
-
-
21. A computer-readable storage medium that stores instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of:
-
gathering statistics by a database server about nodes that are stored in a database repository that is managed by the database server; wherein said nodes form a hierarchy; wherein each node is either an XML file or an XML file container; wherein at least one node in the hierarchy is an XML file container that contains a plurality of XML files, each of which contains a plurality of XML elements; storing said statistics; and in response to a single request to the database server for access to one or more XML resources from said database repository, before the database server executes said single request, the database server computing a computational cost associated with each of two or more methods of accessing said one or more XML resources from said database repository, based on said statistics; wherein the step of computing a computational cost comprises computing a selectivity value for each of one or more predicates, from said request, that contain operators on said database repository; and wherein XML files of said nodes are XML resources, and wherein each of said XML resources is stored, in association with a location of a node in said hierarchy, in a column of a table in said database repository, and wherein an operator contained in at least one of said one or more predicates is an operator that determines whether a particular XML resource can be located in said database repository at a terminal location of a particular specified path, specified in said at least one predicate, through a portion of said hierarchy.
-
-
26. A method comprising the computer-implemented steps of:
-
gathering, by a database management system, statistics about how many nodes that are stored in a repository of said database management system satisfy certain criteria; wherein said nodes form a hierarchy; wherein each node is either an XML file or an XML file container; wherein at least one node in the hierarchy is an XML file container that contains a plurality of XML files, each of which contains a plurality of XML elements; wherein XML files of said nodes are XML resources; storing said statistics in said database management system; the database management system using the statistics to determine how to process a query that accesses one or more XML resources; wherein the method is performed by one or more computing devices; and wherein the step of gathering statistics comprises gathering each of a total number of nodes, in said hierarchy, that are accessible via a path through a specified node, a total number of containers, in said hierarchy, that are accessible via a path through said specified node, a total number of nodes, in said hierarchy, that are accessible via a path through said specified node and that are in a level of said hierarchy that is immediately under a level of said specified node, and a total number of containers, in said hierarchy, that are accessible via a path through said specified node and that are in a level of said hierarchy that is immediately under said level of said specified node. - View Dependent Claims (27)
-
-
28. A computer-readable storage medium that stores instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of:
-
gathering, by a database management system, statistics about how many nodes that are stored in a repository of said database management system satisfy certain criteria; wherein said nodes form a hierarchy; wherein each node is either an XML file or an XML file container; wherein at least one node in the hierarchy is an XML file container that contains a plurality of XML files, each of which contains a plurality of XML elements; wherein XML files of said nodes are XML resources; storing said statistics in said database management system; the database management system using the statistics to determine how to process a query that accesses one or more XML resources; wherein the step of gathering statistics comprises gathering each of a total number of nodes, in said hierarchy, that are accessible via a path through a specified node, a total number of containers, in said hierarchy, that are accessible via a path through said specified node, a total number of nodes, in said hierarchy, that are accessible via a path through said specified node and that are in a level of said hierarchy that is immediately under a level of said specified node, and a total number of containers, in said hierarchy, that are accessible via a path through said specified node and that are in a level of said hierarchy that is immediately under said level of said specified node.
-
-
29. A method comprising the computer-implemented steps of:
in response to a request for access to one or more XML resources from a database repository within a database management system, accessing, from said database management system, statistics about a structure of a hierarchy associated with said one or more XML resources; wherein nodes form said hierarchy; wherein each node of said hierarchy is either an XML file or an XML file container; and wherein at least one node in the hierarchy is an XML file container that contains a plurality of XML files, each of which contains a plurality of XML elements; computing a computational cost associated with each of two or more methods of accessing said one or more XML resources from said database repository, based on said statistics; wherein the two or more methods of accessing said one or more XML resources from said database repository include accessing said one or more XML resources through an index and accessing said one or more XML resources without using the index; wherein the method is performed by one or more computing devices; wherein XML files of said nodes are XML resources, and wherein the step of computing a computational cost comprises (a) computing a selectivity value for each of one or more predicates, from said request, that contain operators on said database repository and (b) computing a computational cost of traversing, to locate a particular XML resource specified in said request, an index in which said XML resources are indexed for accessing said database repository.
-
30. A computer-readable storage medium that stores instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of:
in response to a request for access to one or more XML resources from a database repository within a database management system, accessing, from said database management system, statistics about a structure of a hierarchy associated with said one or more XML resources; wherein nodes form said hierarchy; wherein each node of said hierarchy is either an XML file or an XML file container; and wherein at least one node in the hierarchy is an XML file container that contains a plurality of XML files, each of which contains a plurality of XML elements; computing a computational cost associated with each of two or more methods of accessing said one or more XML resources from said database repository, based on said statistics; wherein the two or more methods of accessing said one or more XML resources from said database repository include accessing said one or more XML resources through an index and accessing said one or more XML resources without using the index; wherein XML files of said nodes are XML resources, and wherein the step of computing a computational cost comprises (a) computing a selectivity value for each of one or more predicates, from said request, that contain operators on said database repository and (b) computing a computational cost of traversing, to locate a particular XML resource specified in said request, an index in which said XML resources are indexed for accessing said database repository.
-
31. A system comprising:
-
one or more hardware processors; means, executing on the one or more hardware processors, for gathering statistics by a database server about nodes that are stored in a database repository that is managed by the database server; means, executing on the one or more hardware processors, for storing said statistics; and means, executing on the one or more hardware processors, for computing, in response to a request to the database server for access to one or more XML resources from said database repository and based on said statistics, a computational cost, by the database server, associated with each of two or more methods of accessing said one or more XML resources from said database repository; wherein the two or more methods of accessing said one or more XML resources from said database repository include accessing said one or more XML resources through an index and accessing said one or more XML resources without using the index; wherein said nodes form a hierarchy; wherein each node is either an XML file or an XML file container; and wherein at least one node in the hierarchy is an XML file container that contains a plurality of XML files, each of which contains a plurality of XML elements; wherein XML files of said nodes are XML resources, and wherein said means for computing a computational cost comprises (a) means for computing a selectivity value for each of one or more predicates, from said request, that contain operators on said database repository and (b) means for computing a computational cost of traversing, to locate a particular XML resource specified in said request, an index in which said XML resources are indexed for accessing said database repository.
-
-
40. A database system comprising:
-
one or more hardware processors; an XML data repository comprising XML files and XML file containers forming a hierarchy; wherein at least one XML file container contains a plurality of XML files, each of which contains a plurality of XML elements; and a database server, executing on the one or more hardware processors, that manages the XML data repository, wherein the database server is configured to; gather statistics about the XML files and the XML file containers, store said statistics, receive a request for access to one or more XML resources from the XML data repository, and compute a computational cost associated with each of two or more methods of accessing said one or more XML resources from the XML data repository, based on said statistics; wherein the two or more methods of accessing said one or more XML resources from said XML data repository include accessing said one or more XML resources through an index and accessing said one or more XML resources without using the index; wherein said XML files are XML resources, and wherein computing a computational cost comprises (a) computing a selectivity value for each of one or more predicates, from said request, that contain operators on said XML data repository and (b) computing a computational cost of traversing, to locate a particular XML resource specified in said request, an index in which said XML resources are indexed for accessing said XML data repository.
-
-
41. A method comprising the computer-implemented steps of:
-
gathering statistics by a database server about XML files and XML file containers; wherein the XML files and XML file containers are hierarchically stored in a database repository that is managed by the database server; receiving a request to the database server for access, through a view, to one or more XML resources; in response to receiving the request, computing a computational cost associated with each of two or more methods of accessing said one or more XML resources, comprising computing a selectivity value, based at least in part on the statistics, for a predicate included in the request; and determining a query plan based, at least in part, on the selectivity value; wherein the method is performed by one or more computing devices; wherein computing a computational cost associated with each of two or more methods of accessing said one or more XML resources further comprises computing a computational cost of traversing, to locate a particular XML resource specified in said request, an index in which said one or more XML resources are indexed for accessing said database repository; wherein the two or more methods of accessing said one or more XML resources include accessing said one or more XML resources through an index and accessing said one or more XML resources without using the index.
-
-
43. A computer-readable storage medium that stores instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of:
-
gathering statistics by a database server about XML files and XML file containers; wherein the XML files and XML file containers are hierarchically stored in a database repository that is managed by the database server; receiving a request to the database server for access, through a view, to one or more XML resources; in response to receiving the request, computing a computational cost associated with each of two or more methods of accessing said one or more XML resources, comprising computing a selectivity value, based at least in part on the statistics, for a predicate included in the request; and determining a query plan based, at least in part, on the selectivity value; wherein computing a computational cost associated with each of two or more methods of accessing said one or more XML resources further comprises computing a computational cost of traversing, to locate a particular XML resource specified in said request, an index in which said one or more XML resources are indexed for accessing said database repository; wherein the two or more methods of accessing said one or more XML resources include accessing said one or more XML resources through an index and accessing said one or more XML resources without using the index.
-
Specification