Encoding semi-structured data for efficient search and browse
First Claim
Patent Images
1. A method for encoding semi-structured data, comprising:
- a) Providing a semi-structured data input;
b) obtaining an encoded semi-structured data by selectively encoding at least part of said semi-structured data into strings of arbitary length in a way that (i) maintains non-structural and structural information associated with the semi-structured data, and (ii) the so encoded semi-structured data can be indexed for efficient access.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for encoding XML tree data that includes the step of encoding the semi-structured data into strings of arbitrary length in a way that maintains non-structural and structural information about the XML data, and enables indexing the encoded XML data in a way facilitates efficient search and browsing.
-
Citations
81 Claims
-
1. A method for encoding semi-structured data, comprising:
-
a) Providing a semi-structured data input;
b) obtaining an encoded semi-structured data by selectively encoding at least part of said semi-structured data into strings of arbitary length in a way that (i) maintains non-structural and structural information associated with the semi-structured data, and (ii) the so encoded semi-structured data can be indexed for efficient access. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 47, 48, 49, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 68, 69, 73, 74)
-
-
46. A method for constructing a metadata dictionary in respect of semi-structured data, comprising:
-
a) providing a semi-structured data input;
b) constructing a metadata dictionary that facilitates compressed encoding of at least part of said semi-structured data into strings of arbitrary length in a way that at least maintains non-structural and structural information associated with the semi-structured data.
-
-
50. A method for encoding and indexing semi-structured data, comprising:
-
a) providing a semi-structured data input;
b) obtaining an encoded semi-structured data by selectively encoding at least part of said semi-structured data into strings of arbitrary length in a way that (i) maintains non-structural and structural information associated with the semi-structured data;
c) indexing the encoded semi-structured data using layered index;
the layered index includes basic partitioned index structure;
said layered index maintains a balanced structure of blocks. - View Dependent Claims (75)
-
-
51. A method for encoding and indexing Markup Language (ML) data, comprising:
-
a) providing an ML data input;
b) obtaining an encoded ML data by selectively encoding at least part of said ML data into strings of arbitrary length in a way that (i) maintains non-structural and structural information associated with the semi-structured data;
c) indexing the encoded semi-structured data using layered index;
the layered index includes basic partitioned index structure;
said layered index maintains a balanced structure of blocks. - View Dependent Claims (76)
-
-
52. A method for encoding and indexing semi-structured data, comprising:
-
(a) Providing a semi-structured data input;
b) selectively encoding at least part of said semi-structured data into keys of arbitrary length in a way that (i) maintains non-structural and structural information associated with the semi-structured data;
c) creating a balanced index structure over the arbitrary-lengthed keys. - View Dependent Claims (53, 54, 77, 79, 80)
-
-
55. A method for indexing semi-structured data, comprising:
-
(a) providing a semi-structured data input that include data items;
(b) indexing keys of the data items of the said semi-structured data such that with about no more than 25,000 bytes of interned memory per 1 million data items it is possible to locate an address of any such said data item with no more than 2 I/Os, irrespective of the size of the key.
-
-
56. A method for indexing metadata language (ML) data, comprising:
-
(a) providing an ML data input that include data items;
(b) indexing keys of the data items of the said ML data such that with about no more tan 25,000 bytes of internal memory per 1 million data items it is possible to locate an address of any such said data item with no more than 2 I/Os, irrespective of the size of the key.
-
-
67. A system for encoding semi-structured data, comprising:
-
storage for storing a semi-structured data input;
processor node configured to construct an encoded semi-structured data by selectively encoding at least part of said semi-structured data into strings of arbitrary length in a way that (i) maintains non-structural and structural information associated with the semi-structured data, and (ii) the so encoded semi-structured data can be indexed for efficient access.
-
-
70. A system for encoding and indexing semi-structured data, comprising:
-
storage for storing a semi-structured data input;
processor node configured to construct an encoded semi-structured data by selectively encoding at least part of said semi-structured data into strings of arbitrary length in a way that (i) maintains non-structural and structural information associated with the semi-structured data;
processor node configured to construct an indexing o the encoded semi-structured data using layered index;
the layered index includes basic partitioned index structure;
said layered index maintains a balanced structure of blocks.
-
-
71. A system for encoding and indexing Markup Language (ML) data, comprising:
-
storage for storing an ML data input;
processor node configured to construct an encoded ML data by selectively encoding at least part of said ML data into strings of arbitrary length in a way that (i) maintains non-structural and structural information associated with the semi-structured data;
processor node configured to construct an indexing of the encoded semi-structured data using layered index;
the layered index includes basic partitioned index structure;
said layered index maintains a balanced structure of blocks.
-
-
72. A system for encoding and indexing semi-structured data, comprising:
-
storage for storing a semi-structured data input;
processor node configured to selectively encoding at least part of said semi-structured data into keys of arbitrary length in a way that (i) maintains non-structural and structural information associated with the semi-structured data;
processor node configured to creating a balanced index structure over the arbitrary-lengthed keys.
-
-
78. A storage medium storing data indicative of encoded semi-structured data that includes strings of arbitrary length that (i) maintains non-structural and structural information associated with the semi-structured data, and (ii) the so encoded semi-structured data can be indexed for efficient access.
-
81. In a computer system having a storage medium of at least an internal memory and an external memory;
a data structure that includes an index over the keys of the data items;
the index is arranged in blocks, such that with about no more than 25,000 bytes of internal memory per 1 million data items it is possible to locate an address of any such said data item with no more than 2 I/Os access to the external memory, irrespective of the size of the key.
Specification