System, method and data structure for fast loading, storing and access to huge data sets in real time
First Claim
1. A computer-implemented method of database management, the method comprising the steps of:
- (a) receiving, at a data loader process, one or more rows of values of raw data records;
(b) decomposing, at said loader process, each of said one or more rows of values to separate field values, said separation being dictated by a data scheme;
(c) assigning a numerical value key id to each said field value of said rows of values;
(d) storing in a keys table in a non-transitory computer readable memory one unique instance for each said key id;
(e) storing sequentially each said key id in a pre-allocated non-transitory computer readable data memory block, said data memory block including rows divided into data columns according to said data scheme separation, each data column holding a plurality of said key ids of said data column, each row of said rows having a record id indicating a position of said row in said data memory block within a plurality of data blocks;
(f) building field indexes, each said field index related to a data column of said data columns in said data block and including a list of unique instances of each said key id;
(g) converting each said field index to a Super Hierarchical Bitmap (SHB) data structure store in the non-transitory computer readable memory;
(h) generating an inverted index block related to said data block including said field indexes and, for each unique instance of each said key id in each said field index, an ordered list of said record ids of said rows in said data memory block in which said key ids equivalent to said unique instance of said key id are stored; and
(i) allocating a new data memory block in said non-transitory computer readable memory when a current record id of a row exceeds a preallocated number of rows in said data memory block.
3 Assignments
0 Petitions
Accused Products
Abstract
A computerized system including a processor and a computer-readable non-transient memory in communication with the processor, the memory storing instructions that when executed manage a novel data structure and related group of algorithms that can be used as a method for representing a set and as a base for very efficient indexing, hash and compression. SHB is an improvement of hierarchical bitmap. An improved database system that can utilize the innovative data structure which includes a raw data stream provided to the system via a data processing module, data blocks, fields indexes tables and a keys table. There is provided an index creating process and a columns creating process, for transforming the data blocks and tables into index blocks and data columns.
-
Citations
30 Claims
-
1. A computer-implemented method of database management, the method comprising the steps of:
-
(a) receiving, at a data loader process, one or more rows of values of raw data records; (b) decomposing, at said loader process, each of said one or more rows of values to separate field values, said separation being dictated by a data scheme; (c) assigning a numerical value key id to each said field value of said rows of values; (d) storing in a keys table in a non-transitory computer readable memory one unique instance for each said key id; (e) storing sequentially each said key id in a pre-allocated non-transitory computer readable data memory block, said data memory block including rows divided into data columns according to said data scheme separation, each data column holding a plurality of said key ids of said data column, each row of said rows having a record id indicating a position of said row in said data memory block within a plurality of data blocks; (f) building field indexes, each said field index related to a data column of said data columns in said data block and including a list of unique instances of each said key id; (g) converting each said field index to a Super Hierarchical Bitmap (SHB) data structure store in the non-transitory computer readable memory; (h) generating an inverted index block related to said data block including said field indexes and, for each unique instance of each said key id in each said field index, an ordered list of said record ids of said rows in said data memory block in which said key ids equivalent to said unique instance of said key id are stored; and (i) allocating a new data memory block in said non-transitory computer readable memory when a current record id of a row exceeds a preallocated number of rows in said data memory block. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
2. A method of identifying a value among a plurality of values, the method comprising:
-
(a) providing a first index including a list of unique instances of key ids that represent said plurality of values, said first index being represented by a data structure; (b) providing a second index including a plurality of ordered lists of record ids indicative of positions of said key ids in a data block, and a position vector that maps said first index to said second index; (c) searching for a key id that represents said value in said first index in a predetermined time irrespective of size of said first index; (d) searching in said second index, through said position vector, for at least one record id in an ordered list of said ordered lists of record ids;
said at least one record id indicating a position of said key id in said data block;
the searching in said second index is performed in a predetermined time irrespective of size of said second index; and(e) fetching the key id from said data block according to said position and retrieving said value according to the fetched key id. - View Dependent Claims (28, 29)
-
-
3. A computerized system comprising:
-
a processor; and a computer-readable non-transient memory in communication with the processor, the memory storing instructions that when executed maintain a database management system that includes; (a) a data-processing module configured to receive a plurality of rows of values of raw data records and decompose said rows of values into separate fields values, wherein said data-processing module is further configured to assign a numerical value key id to each said field value; (b) a plurality of Data Blocks each including a plurality of said key ids sequentially stored in rows divided into data columns; (c) a plurality of logical record ids, each said record id being for each of said rows of key ids, said record id having a value equal to a sequential position of said row of key ids in a data block within said plurality of Data Blocks; (d) a plurality of said Data Columns, each Data Column holding a plurality of said key ids of said data column of said Data Block; (e) a plurality of field indexes, each said field index related to a data column of said data columns of said data block and including a list of unique instances of each said key ids;
said data-processing module further configured to covert each said field index to a Super Hierarchical Bitmap (SHB) data structure;(f) a plurality of inverted index blocks each related to a data block of said data blocks, each said inverted index block including said field indexes and, for each unique instance of each said key id in each said field index, an ordered list of said record ids of said rows in said data block in which said key ids equivalent to said unique instance of said key id are stored; and (g) a keys table, including a list of one unique instance of each said key id.
-
-
4. A non-transitory computer readable storage medium tangibly embodying a data structure capable of being utilized by a processor for database management, said data structure comprising:
-
(a) one or more field indexes each represented by a Super Hierarchical Bitmap (SHB) data structure and related to a data column of a data block, each said field index including a list of unique instances of key ids of said data column; (b) an inverted index block related to said data block, said inverted index block being composed of one or more inverted index vectors each of which related to a data column in said data block and including a plurality of ordered lists of record ids of rows of said data block, each ordered list related to a unique instance of a respective key id in said field index, said rows of said data block holding key ids each having an equivalent unique instance of a key id among said list of unique instances of key ids; and (c) one or more position vectors each associated with a field index and including members each of which being associated with a unique instance of a key id of said field index, and representing a relative position of an order list of said order lists in said inverted index vector. - View Dependent Claims (30)
-
Specification