Indexing XML documents efficiently
First Claim
1. A method for providing access to data for nodes, each node of said nodes belonging to an information hierarchy, wherein said data for said nodes are stored in a plurality of rows of a table, the method comprising the steps of:
- generating a plurality of hash values that reflects an order of a plurality of pathnames associated with said nodes, wherein each hash value in the plurality of hash values is generated based on character values of a particular pathname in the plurality of pathnames, the order being character-based;
wherein each pathname of said plurality of pathnames corresponds to a row of said plurality of rows and identifies a location within an information hierarchy that includes a node whose data is stored in the row; and
storing, in a database, said plurality of hash values in an index that associates said plurality of hash values with said plurality of rows;
wherein said index is ordered by said plurality of hash values as key values;
identifying a range of hash values based on a path that includes one or more nodes;
scanning said index based on said range of hash values to find one or more rows corresponding to said range of hash values, wherein said one or more rows store data of one or more descendant nodes to said one or more nodes; and
retrieving said data of said one or more descendant nodes from said one or more rows;
wherein the method is performed by one or more computing devices.
1 Assignment
0 Petitions
Accused Products
Abstract
Information hierarchies are efficiently stored and accessed in a relational or object-relational database system. A path signature, similar to a pathname, is stored in a database system in association with data for the node identified by the pathname. For example, a path signature identifying an element is stored in a row that holds data for the element. To retrieve data for a hierarchical query that identifies the data requested using, for example, an XPATH string, a string pattern is generated that is matched by path signatures identified by the XPATH string. Pattern matching is then used to select rows associated with matching path signatures, and data from the selected rows is used to compute the XPATH query. Furthermore, hash values representing path signatures are generated in a way that preserves the ordering of data in an information hierarchy. The hash values can be indexed to provide quick access.
331 Citations
14 Claims
-
1. A method for providing access to data for nodes, each node of said nodes belonging to an information hierarchy, wherein said data for said nodes are stored in a plurality of rows of a table, the method comprising the steps of:
-
generating a plurality of hash values that reflects an order of a plurality of pathnames associated with said nodes, wherein each hash value in the plurality of hash values is generated based on character values of a particular pathname in the plurality of pathnames, the order being character-based; wherein each pathname of said plurality of pathnames corresponds to a row of said plurality of rows and identifies a location within an information hierarchy that includes a node whose data is stored in the row; and storing, in a database, said plurality of hash values in an index that associates said plurality of hash values with said plurality of rows; wherein said index is ordered by said plurality of hash values as key values; identifying a range of hash values based on a path that includes one or more nodes; scanning said index based on said range of hash values to find one or more rows corresponding to said range of hash values, wherein said one or more rows store data of one or more descendant nodes to said one or more nodes; and retrieving said data of said one or more descendant nodes from said one or more rows; wherein the method is performed by one or more computing devices. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory computer-readable storage medium storing one or more sequences of instructions for providing access to data for nodes, each node of said nodes belonging to an information hierarchy, wherein said data for said nodes are stored in a plurality of rows of a table, wherein said one or more sequences of instructions, when executed by one or more processors, causes the one or more processors to perform:
-
generating a plurality of hash values that reflects an order of a plurality of pathnames associated with said nodes, wherein each hash value in the plurality of hash values is generated based on character values of a particular pathname in the plurality of pathnames, the order being character-based; wherein each pathname of said plurality of pathnames corresponds to a row of said plurality of rows and identifies a location within an information hierarchy that includes a node whose data is stored in the row; and storing, in a database, said plurality of hash values in an index that associates said plurality of hash values with said plurality of rows; wherein said index is ordered by said plurality of hash values as key values; identifying a range of hash values based on a path that includes one or more nodes; scanning said index based on said range of hash values to find one or more rows corresponding to said range of hash values, wherein said one or more rows store data of one or more descendant nodes to said one or more nodes; and retrieving said data of said one or more descendant nodes from said one or more rows. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
Specification