Encoding semi-structured data for efficient search and browsing
First Claim
Patent Images
1. A method for transforming an XML data input to enable search over the XML data input or components thereof, the method comprising:
- a. transforming said XML data input into a logical tree in which tags or attributes of said XML data input are nodes of the logical tree, each of said nodes representing non-structural or structural information from said XML data input;
b. labeling each of said nodes of said logical tree with tokens from a token table, the token table including tokens and matching tags or attributes being tags or attributes of said XML data input, thereby obtaining labeled nodes; and
c. creating an index from said tokens to facilitate efficient search of data in said logical tree, said index representing paths of labeled nodes in said logical tree whereas the indexing process includes transforming said paths into keys to be indexed, the keys being indexed being strings of arbitrary length that each includes both;
non-structural and structural information from said XML data input,wherein one portion of the key comprising of tokens representing structural information and second portion of the key comprising non-structural information,wherein the structural information is a representation of the structural markup and attributes of said XML data input, and the non-structural information represents content from the XML data input.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for encoding XML tree data that includes the step of encoding the semi-structured data into strings of arbitrary length in a way that maintains non-structural and structural information about the XML data, and enables indexing the encoded XML data in a way that facilitates efficient search and browsing.
-
Citations
5 Claims
-
1. A method for transforming an XML data input to enable search over the XML data input or components thereof, the method comprising:
-
a. transforming said XML data input into a logical tree in which tags or attributes of said XML data input are nodes of the logical tree, each of said nodes representing non-structural or structural information from said XML data input; b. labeling each of said nodes of said logical tree with tokens from a token table, the token table including tokens and matching tags or attributes being tags or attributes of said XML data input, thereby obtaining labeled nodes; and c. creating an index from said tokens to facilitate efficient search of data in said logical tree, said index representing paths of labeled nodes in said logical tree whereas the indexing process includes transforming said paths into keys to be indexed, the keys being indexed being strings of arbitrary length that each includes both;
non-structural and structural information from said XML data input,wherein one portion of the key comprising of tokens representing structural information and second portion of the key comprising non-structural information, wherein the structural information is a representation of the structural markup and attributes of said XML data input, and the non-structural information represents content from the XML data input. - View Dependent Claims (2, 3, 4, 5)
-
Specification