Encoding semi-structured data for efficient search and browsing
First Claim
Patent Images
1. A method for encoding semi-structured data, the method implemented by at least one processor and comprising:
- a. providing a semi-structured data input, the semi-structured data input being a Markup Language (ML) data or representation thereof; and
b. obtaining an encoded semi-structured data by selectively encoding at least part of the semi-structured data into strings of arbitrary length, the strings of arbitrary length each maintaining both structural information of the semi-structured data and non-structural information, and the so encoded semi-structured data operates as keys to be indexed by an index for efficient access, said index is based on a trie,wherein the structural information represents at least relations or order between data items provided as input, andwherein encoding at least part of the semi-structured data includes replacing at least one of the structural information and the non-structural information with a token, the token being associated with the at least one of the structural information and the non-structural information.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for encoding XML tree data that includes the step of encoding the semi-structured data into strings of arbitrary length in a way that maintains non-structural and structural information about the XML data, and enables indexing the encoded XML data in a way that facilitates efficient search and browsing.
-
Citations
21 Claims
-
1. A method for encoding semi-structured data, the method implemented by at least one processor and comprising:
-
a. providing a semi-structured data input, the semi-structured data input being a Markup Language (ML) data or representation thereof; and b. obtaining an encoded semi-structured data by selectively encoding at least part of the semi-structured data into strings of arbitrary length, the strings of arbitrary length each maintaining both structural information of the semi-structured data and non-structural information, and the so encoded semi-structured data operates as keys to be indexed by an index for efficient access, said index is based on a trie, wherein the structural information represents at least relations or order between data items provided as input, and wherein encoding at least part of the semi-structured data includes replacing at least one of the structural information and the non-structural information with a token, the token being associated with the at least one of the structural information and the non-structural information. - View Dependent Claims (2, 3, 4)
-
-
5. A method for encoding semi-structured data, the method implemented by at least one processor and comprising:
-
a. providing a semi-structured data input, the semi-structured data input being a Markup Language (ML) data or representation thereof; b. obtaining an encoded semi-structured data by selectively encoding at least part of the semi-structured data into strings of arbitrary length, said strings of arbitrary length each maintaining both non-structural information and structural information of the semi-structured data, the encoding including at least associating the structural information with a compressed representation of the structural information; and c. in response to a query for information of interest, retrieving the information of interest using the strings of arbitrary length, wherein the structural information represents at least relations or order between data items provided as input, and wherein encoding at least part of the semi-structured data includes replacing at least one of the structural information and the non-structural information with a token, the token being associated with the at least one of the structural, information and the non-structural information. - View Dependent Claims (6, 7, 8)
-
-
9. A method for encoding semi-structured data, the method implemented by at least one processor and comprising:
-
a. providing a semi-structured data input, the semi-structured data input being a Markup Language (ML) data or representation thereof; b. obtaining an encoded semi-structured data by selectively encoding at least part of the semi-structured data into keys that each maintain both non-structural information and structural information of the semi-structured data; c. creating a single index over the keys; and d. in response to a query for information of interest, the query includes both structural components and non-structural components;
retrieving the information of interest using the index, the retrieval process does not use join operations,wherein the structural information represents at least relations or order between data items provided as input, and wherein encoding at least part of the semi-structured data includes replacing at least one of the structural information and the non-structural information with a token, the token being associated with the at least one of the structural information and the non-structural information. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
Specification