Encoding semi-structured data for efficient search and browsing
First Claim
Patent Images
1. A method for encoding semistructured data, comprising:
- a) providing a semi-structured data input;
b) obtaining an encoded semi-structured data by selectively encoding at least part of said semi-structured data into strings of arbitrary length in a way that at least (i) maintains non-structural and structural information associated with the semi-structured data, and (ii) the so encoded semi-structured data can be indexed for efficient access.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for encoding XML tree data that includes the step of encoding the semi-structured data into strings of arbitrary length in a way that maintains non-structural and structural information about the XML data, and enables indexing the encoded XML data in a way that facilitates efficient search and browsing.
-
Citations
97 Claims
-
1. A method for encoding semistructured data, comprising:
-
a) providing a semi-structured data input;
b) obtaining an encoded semi-structured data by selectively encoding at least part of said semi-structured data into strings of arbitrary length in a way that at least (i) maintains non-structural and structural information associated with the semi-structured data, and (ii) the so encoded semi-structured data can be indexed for efficient access. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 73, 74, 83, 84, 85, 86, 87, 88, 92)
-
-
46. A method for constructing a metadata dictionary in respect of semi-structured data, comprising:
-
a) providing a semi-structured data input;
b) constructing a metadata dictionary that facilitates compressed encoding of at least part of said semi-structured data into strings of arbitrary length in a way that at least maintains non-structural and structural information associated with the semi-structured data. - View Dependent Claims (47, 48, 49)
-
-
50. A method for encoding and indexing semi-structured data, comprising:
-
a) providing a semi-structured data input;
b) obtaining an encoded semi-structured data by selectively encoding at least part of said semi-structured data into strings of arbitrary length in a way that at least (i) maintains non-structural and structural information associated with the semi-structured data;
c) indexing the encoded semi-structured data using layered index;
the layered index includes basic partitioned index structure;
said layered index maintains a balanced structure of blocks. - View Dependent Claims (75)
-
-
51. A method for encoding and indexing Markup Language (ML) data, comprising:
-
a) providing an ML data input;
b) obtaining an encoded ML data by selectively encoding at least part of said ML data into strings of arbitrary length in a way that at least (i) maintains non-structural and structural information associated with the semi-structured data;
c) indexing the encoded semi-structured data using layered index;
the layered index includes basic partitioned index structure;
said layered index maintains a balanced structure of blocks. - View Dependent Claims (76)
-
-
52. A method for encoding and indexing semi-structured data, comprising:
-
a) providing a semi-structured data input;
selectively encoding at least part of said semi-structured data into keys of arbitrary length in a way that at least (i) maintains non-structural and structural information associated with the semi-structured data;
b) creating a balanced index structure over the arbitrary-length keys. - View Dependent Claims (53, 54, 77, 89, 90, 91)
-
-
55. A method for indexing semi-structured data, comprising:
-
a) providing a semi-structured data input that include data items;
b) indexing keys of the data items of the said semi-structured data such that with about no more than 25,000 bytes of internal memory per 1 million data items it is possible to locate a address of any such said data item with no more than 2 I/Os, irrespective of the size of the key.
-
-
56. A method for indexing metadata language (ML) data, comprising:
-
a) providing an ML data input that include data items;
b) indexing keys of the data items of the said ML data such that with about no more than 25,000 bytes of internal memory per 1 million data items it is possible to locate an address of any such said data item with no more than 2 I/Os, irrespective of the size of the key.
-
-
67. A system for encoding semi-structured data, comprising:
-
storage for storing a semi-structured data input;
processor node configured to construct an encoded semi-structured data by selectively encoding at least part of said semi-structured data into strings of arbitrary length in a way that at least (i) maintains non-structural and structural information associated with the semi-structured data, and (ii) the so encoded semi-structured data can be indexed for efficient access. - View Dependent Claims (68, 69)
-
-
70. A system for encoding and indexing semi-structured data, comprising:
-
storage for storing a semi-structured data input;
processor node configured to construct an encoded semi-structured data by selectively encoding at least part of said semi-structured data into strings of arbitrary length in a way that at least (i) maintains non-structural and structural information associated with the semi-structured data;
processor node configured to construct an indexing o the encoded semi-structured data using layered index;
the layered index includes basic partitioned index structure;
said layered index maintains a balanced structure of blocks.
-
-
71. A system for encoding and indexing Markup Language (ML) data, comprising:
-
storage for storing an ML data input;
processor node configured to construct an encoded ML data by selectively encoding at least part of said ML data into strings of arbitrary length in a way that at least (i) maintains non-structural and structural information associated with the semi-structured data;
processor node configured to construct an indexing of the encoded semi-structured data using layered index;
the layered index includes basic partitioned index structure;
said layered index maintains a balanced structure of blocks.
-
-
72. A system for encoding and indexing semi-structured data, comprising:
-
storage for storing a semi-structured data input;
processor node configured to selectively encoding at least part of said semi-structured data into keys of arbitrary length in a way that at least (i) maintains non-structural and structural information associated with the semi-structured data;
processor node configured to creating a balanced index structure over the arbitrary-length keys.
-
- 78. A storage medium storing data indicative of encoded semi-structured data that includes strings of arbitrary length that at least (i) maintains non-structural and structural information associated with the semi-structured data, and (ii) the so encoded semi-structured data can be indexed for efficient access.
-
81. In a computer system having a storage medium of at least an internal memory and an external memory;
a data structure that includes an index over the keys of the data items;
the index is arranged in blocks, such that with about no more than 25,000 bytes of internal memory per 1 million data items it is possible to locate an address of any such said data item with no more than 2 I/Os access to the external memory, irrespective of the size of the key.
-
82. A storage medium storing data indicative of a metadata dictionary for semi-structured data;
- the metadata dictionary facilitates compressed encoding of at least part of said semi-structured data into strings of arbitrary length in a way that at least maintains non-structural and structural information associated with the semi-structured data.
-
93. A storage medium storing data indicative of an index over keys of arbitrary length encoded from semi-structured data, said index is partitioned into blocks;
- said index constitutes an essentially balanced structure of blocks.
- View Dependent Claims (94, 95, 96, 97)
Specification