Method for scalable, fast normalization of XML documents for insertion of data into a relational database
First Claim
1. A method of transferring data from a markup language file having a hierarchical structure to a relational database, said hierarchical structure comprising a tree or forest of nodes on which depth first search imposes a total ordering, with some nodes designated as repeating nodes, and said method comprising:
- partitioning said hierarchical structure into sections, wherein each section is dedicated to at least one leaf node of said hierarchical structure, and wherein two non-repeating leaf nodes that are adjacent in frontier order and have the same parent are contained in the same section, frontier order being the order in which leaf nodes are encountered in a depth first search of said hierarchical structure;
allocating a memory section for each of said sections of said hierarchical structure according to the data types of the nodes in the section;
after completing said partitioning and allocating, parsing said markup language file to produce a stream of data pairs, wherein each of said data pairs comprises an element of node data and an element of node location information, and wherein said node location information indicates the location of the corresponding node within said hierarchical structure;
while performing said parsing process, loading said node data into the memory section allocated for the section containing the corresponding node location as said data pairs are output from said parsing process; and
transferring said node data from said sections to said relational database, wherein information is transferred from one section as soon as said loading process completes loading at least one element of node data to said one memory section and an end of section indicator has been encountered by said parsing process.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed is a method of transferring data from a hierarchical file (having a hierarchical structure, e.g., a markup language file) to a relational database structure (made up of columns and rows. Before processing the actual data, the invention first partitions the hierarchical structure into sections, where each section is dedicated to at least one node of the hierarchical structure. The partitioning process is based on the document type definition file, which is separate from, and different than the hierarchical file. After completing the partitioning, the invention then parses the actual data contained in the hierarchical data file to produce a stream of data pairs and end of section indicators. During the data parsing process, the invention loads the data pairs into corresponding “sections” (created prior to the parsing process) as the data pairs are output from the parsing process. The invention also transfers the node data from these sections to the columns and rows of the relational database structure.
105 Citations
33 Claims
-
1. A method of transferring data from a markup language file having a hierarchical structure to a relational database, said hierarchical structure comprising a tree or forest of nodes on which depth first search imposes a total ordering, with some nodes designated as repeating nodes, and said method comprising:
-
partitioning said hierarchical structure into sections, wherein each section is dedicated to at least one leaf node of said hierarchical structure, and wherein two non-repeating leaf nodes that are adjacent in frontier order and have the same parent are contained in the same section, frontier order being the order in which leaf nodes are encountered in a depth first search of said hierarchical structure;
allocating a memory section for each of said sections of said hierarchical structure according to the data types of the nodes in the section;
after completing said partitioning and allocating, parsing said markup language file to produce a stream of data pairs, wherein each of said data pairs comprises an element of node data and an element of node location information, and wherein said node location information indicates the location of the corresponding node within said hierarchical structure;
while performing said parsing process, loading said node data into the memory section allocated for the section containing the corresponding node location as said data pairs are output from said parsing process; and
transferring said node data from said sections to said relational database, wherein information is transferred from one section as soon as said loading process completes loading at least one element of node data to said one memory section and an end of section indicator has been encountered by said parsing process. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method of transferring data from a markup language file having a hierarchical structure to a relational database, said hierarchical structure comprising a tree or forest of nodes on which depth first search imposes a total ordering, with some nodes designated as repeating nodes, and said method comprising:
-
partitioning said hierarchical structure into sections, wherein each section is dedicated to at least one leaf node of said hierarchical structure, and wherein two non-repeating leaf nodes that are adjacent in frontier order and have the same parent are contained in the same section, frontier order being the order in which leaf nodes are encountered in a depth first search of said hierarchical structure;
allocating a memory section for each said section of said hierarchical structure according to the data types of the nodes in the section;
after completing said partitioning and allocating, parsing said markup language file to produce a stream of data pairs, wherein each of said data pairs comprises an element of node data and an element of node location information, and wherein said node location information indicates the location of the corresponding node within said hierarchical structure;
loading said node data into corresponding sections as said node data elements are output from said parsing process; and
transferring said node data from said sections to said relational database, wherein information is transferred from one section as soon as said loading process completes loading at least one element of node data to said one memory section and an end of section indicator has been encountered by said parsing process. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A method of transferring data from a markup language file having a hierarchical structure to a relational database, said hierarchical structure comprising a tree or forest of nodes on which depth first search imposes a total ordering, with some nodes designated as repeating nodes, and said method comprising:
-
partitioning said hierarchical structure into sections, wherein each section is dedicated to at least one leaf node of said hierarchical structure, and wherein two non-repeating leaf nodes that are adjacent in frontier order and have the same parent are contained in the same section, frontier order being the order in which leaf nodes are encountered in a depth first search of said hierarchical structure;
allocating a memory section for each said section of said hierarchical structure according to the data types of the nodes in the section;
after completing said partitioning and allocating, parsing said markup language file to produce a stream of data pairs, wherein each of said data pairs comprises an element of node data and an element of node location information, and wherein said node location information indicates the location of the corresponding node within said hierarchical structure;
, wherein each of said data pairs is in the form (tag, field), and wherein said field represents node data and said tag represents the location of corresponding node data within said hierarchical structure;
loading said data pairs into corresponding sections as said data pairs are output from said parsing process; and
transferring said node data from said sections to said relational database, wherein information is transferred from one section as soon as said loading process completes loading at least one element of node data to said one memory section and begins loading a different element of node data to a different memory section. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
-
22. A method of altering the hierarchical structure of a markup language file for being processed into a relational database, said method comprising:
-
identifying repeating nodes and non-repeating nodes within said hierarchical structure; and
reorganizing said hierarchical structure such that non-repeating nodes are positioned before repeating nodes within each hierarchal level of said hierarchical structure. - View Dependent Claims (23, 24, 25)
-
-
26. A method of transferring data from a markup language file having a hierarchical structure to a relational database said method comprising:
-
partitioning said hierarchical structure into sections;
allocating a memory section for each of said sections of said hierarchical structure according to the data types of the nodes in the section;
after completing said partitioning and allocating, parsing said markup language file to produce a stream of data pairs while performing said parsing process, loading said node data into the memory section allocated for the section containing the corresponding node location as said data pairs are output from said parsing process; and
transferring said node data from said sections to said relational database.
-
-
27. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform a method of transferring data from a markup language file having a hierarchical structure to a relational database, said hierarchical structure comprising a tree or forest of nodes on which depth first search imposes a total ordering, with some nodes designated as repeating nodes, and said method comprising:
-
partitioning said hierarchical structure into sections, wherein each section is dedicated to at least one leaf node of said hierarchical structure, and wherein two non-repeating leaf nodes that are adjacent in frontier order and have the same parent are contained in the same section, frontier order being the order in which leaf nodes are encountered in a depth first search of said hierarchical structure;
allocating a memory section for each said section of said hierarchical structure according to the data types of the nodes in the section;
after completing said partitioning and allocating, parsing said markup language file to produce a stream of data pairs, wherein each of said data pairs comprises an element of node data and an element of node location information, and wherein said node location information indicates the location of the corresponding node within said hierarchical structure;
while performing said parsing process, loading said node data into the memory section allocated for the section containing the corresponding node location as said data pairs are output from said parsing process; and
transferring said node data from said sections to said relational database, wherein information is transferred from one section as soon as said loading process completes loading at least one element of node data to said one memory section and begins loading a different element of node data to a different memory section. - View Dependent Claims (28, 29, 30, 31, 32, 33)
-
Specification