System, method and data structure for fast loading, storing and access to huge data sets in real time

US 9,405,790 B2
Filed: 12/17/2015
Issued: 08/02/2016
Est. Priority Date: 06/27/2011
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of database management, the method comprising the steps of:

(a) receiving, at a data loader process, one or more rows of values of raw data records;

(b) decomposing, at said loader process, each of said one or more rows of values to separate field values, said separation being dictated by a data scheme;

(c) assigning a numerical value key id to each said field value of said rows of values;

(d) storing in a keys table in a non-transitory computer readable memory one unique instance for each said key id;

(e) storing sequentially each said key id in a pre-allocated non-transitory computer readable data memory block, said data memory block including rows divided into data columns according to said data scheme separation, each data column holding a plurality of said key ids of said data column, each row of said rows having a record id indicating a position of said row in said data memory block within a plurality of data blocks;

(f) building field indexes, each said field index related to a data column of said data columns in said data block and including a list of unique instances of each said key id;

(g) converting each said field index to a Super Hierarchical Bitmap (SHB) data structure store in the non-transitory computer readable memory;

(h) generating an inverted index block related to said data block including said field indexes and, for each unique instance of each said key id in each said field index, an ordered list of said record ids of said rows in said data memory block in which said key ids equivalent to said unique instance of said key id are stored; and

(i) allocating a new data memory block in said non-transitory computer readable memory when a current record id of a row exceeds a preallocated number of rows in said data memory block.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computerized system including a processor and a computer-readable non-transient memory in communication with the processor, the memory storing instructions that when executed manage a novel data structure and related group of algorithms that can be used as a method for representing a set and as a base for very efficient indexing, hash and compression. SHB is an improvement of hierarchical bitmap. An improved database system that can utilize the innovative data structure which includes a raw data stream provided to the system via a data processing module, data blocks, fields indexes tables and a keys table. There is provided an index creating process and a columns creating process, for transforming the data blocks and tables into index blocks and data columns.

Citations

30 Claims

1. A computer-implemented method of database management, the method comprising the steps of:
- (a) receiving, at a data loader process, one or more rows of values of raw data records;
  
  (b) decomposing, at said loader process, each of said one or more rows of values to separate field values, said separation being dictated by a data scheme;
  
  (c) assigning a numerical value key id to each said field value of said rows of values;
  
  (d) storing in a keys table in a non-transitory computer readable memory one unique instance for each said key id;
  
  (e) storing sequentially each said key id in a pre-allocated non-transitory computer readable data memory block, said data memory block including rows divided into data columns according to said data scheme separation, each data column holding a plurality of said key ids of said data column, each row of said rows having a record id indicating a position of said row in said data memory block within a plurality of data blocks;
  
  (f) building field indexes, each said field index related to a data column of said data columns in said data block and including a list of unique instances of each said key id;
  
  (g) converting each said field index to a Super Hierarchical Bitmap (SHB) data structure store in the non-transitory computer readable memory;
  
  (h) generating an inverted index block related to said data block including said field indexes and, for each unique instance of each said key id in each said field index, an ordered list of said record ids of said rows in said data memory block in which said key ids equivalent to said unique instance of said key id are stored; and
  
  (i) allocating a new data memory block in said non-transitory computer readable memory when a current record id of a row exceeds a preallocated number of rows in said data memory block.
- View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
- - 5. The method of claim 1, wherein said SHB data structure representing said field index includes:
    - (a) at least one word, wherein each said word contains a predefined number of bits, wherein each said bit is selected from the group including 1-bits and 0-bits;
      
      (b) a plurality of bit vectors, each said bit vector containing at least one word, wherein said at least one word is selected from the group including an empty word containing only said 0-bits and a non-empty word containing at least one said 1-bit;
      
      (c) one or more compressed layers representing corresponding one or more non-compressed layers, wherein;
      
      (i) each said non-compressed layer includes one said bit vector, wherein said one or more non-compressed layers are organized sequentially, such that each of said one or more non-compressed layers except for a last non-compressed layer has a subsequent noncompressed layer related thereto;
      
      (ii) each said unique instance of each said key id in said field index is represented by a 1-bit in the last non-compressed layer and wherein the position of each said 1-bit in said last non-compressed layer is equal to a value of each said unique instance of each said key id; and
      
      (iii) each said non-empty word is represented by a respective 1-bit in a previous non-compressed layer such that a number of said 1-bits in said previous noncompressed layer is equivalent to a number of said non-empty words in a subsequent noncompressed layer and a position of each of said 1-bit in said previous non-compressed layer represents a corresponding position of each said non-empty word in said subsequent noncompressed layer;
      
      wherein said compressed layers other than a first compressed layer include only said non-empty words, and each position of said empty words in said non compressed layer is represented by a position of each said 0-bit in said previous non-compressed layer, said empty words in any non-compressed layer being representative of removed empty words in any corresponding compressed layer, and each position of said removed empty words in a second compressed layer is represented by a position of each said 0-bit in said first compressed layer, so that said second compressed layer is decompressed into a second decompressed layer by calculating said positions of said removed empty words in said second compressed layer according to said positions of said 0-bits in said first compressed layer, and each said compressed layer other than said first and second compressed layers is decompressed sequentially by calculating said positions of said removed empty words in each said compressed layer according to said positions of said 0-bits in a previous decompressed layer; and
      
      (d) one or more counter vectors, each of said counter vectors related to each of said one or more compressed layers, wherein for each said word in each of said compressed layers there exists a related counter member and wherein each said counter member holds a counter value which equals a cumulative number of 1-bits, said cumulative number calculated from a first position in each of said bit vectors to each respective said word in said bit vector related to said counter member.
  - 6. The method of claim 5, wherein said converting comprises:
    - A) arranging said list of unique instances of each said key id in an ascending order from the lowest value to the highest value;
      
      B) for each said unique instance of each said key id in said list, starting from the lowest value, calculating a position of a bit representation in each of said compressed layers in said SHB data structure; and
      
      C) calculating said counter vector related to each of said compressed layers.
  - 7. The method of claim 6, wherein said field index further includes a counter related to each said unique instance of each said key id, said counter storing the number of occurrences of each said key id in said data column.
  - 8. The method of claim 7, wherein said step g) further comprises creating a SHB index for said field index, using said SHB data structure, including:
    - allocating a value vector V associated with said SHB data structure, wherein each member of said value vector V[n] holds a value of said counter related to each said unique instance of each said key id in said field index.
  - 9. The method of claim 7, wherein said generating inverted index block comprises:
    - (1) reading, sequentially, each said row of said key id in a data column, and adding the record id of said row to the ordered list related to a unique instance of said key id, giving rise to an inverted index vector related to said data column, said inverted index vector including a plurality of ordered lists of record ids each of which related to a unique instance of a respective key id in said field index; and
      
      (2) repeating said (1) for every column of said data block, giving rise to said inverted index block being composed of at least one said inverted index vector.
  - 10. The method of claim 9, wherein said generating inverted index block further comprises:
    - allocating memory for each said inverted vector in said inverted index block according to the total number of occurrences of said key ids in each said data column related to said inverted index vector.
  - 11. The method of claim 10, wherein said ordered list of record ids are stored in a Compressed Hierarchical Bitmap (CHB) format.
  - 12. The method of claim 11, wherein said generating inverted index block further comprises:
    - associating each said field index with a position vector, said position vector including members each of which being associated with a unique instance of a key id of said field index, and representing a relative position of an ordered list of said ordered lists in said inverted index vector.
  - 13. The method of claim 1, wherein one unique instance of each of said values is stored in said keys table, and wherein an assigned key id of said value is equal to a position of said value in said keys table.
  - 14. The method of claim 13, wherein each of said values is selected from the group of data types including:
    - an integer, and a non-integer.
  - 15. The method of claim 14, wherein one unique instance of each said non-integer is stored in said keys table, and wherein said assigned key id of said non-integer is equal to a position of said value of said non-integer in said keys table.
  - 16. The method of claim 14, wherein for a given value of said non-integer, a SHB Hash mechanic returns an existing key id for said given value.
  - 17. The method of claim 14, wherein said key id of said value of an integer is derived using a reversible function.
  - 18. The method of claim 12, further comprising searching for a value V in a first data column, said value V being assigned with a key id K, including:
    - I. calculating a single CHB result set, including;
      
      1) for each said inverted index block related to a data block, determining a fetched ordered list, including;
      
      (a) searching for said key id K in a field index using a SHB search function, said field index related to said first data column;
      
      (b) obtaining, from said position vector associated with said field index, said relative position of said ordered list of record ids related to said key id K in said inverted index vector related to said first data column; and
      
      (c) fetching said ordered list of record ids according to said relative position, said ordered list being in said CHB format, giving rise to a fetched ordered list for each inverted index block related to said data block;
      
      2) merging all of the fetched ordered lists of (c) to a single CHB result set, said single CHB result set containing record ids in said data blocks which hold key ids equivalent to said key id K; and
      
      II. fetching key ids from said data blocks according to said single CHB result set, and retrieving corresponding values of said key ids from said keys table, giving rise to a search result of said searched value V.
  - 19. The method of claim 18, further comprising searching for a complex expression with two or more operands using at least one operator, including:
    - I. for each operand of said two or more operands, calculating said single CHB result set, and placing said single CHB result set in a stack, giving rise to a plurality of single CHB result sets;
      
      II. executing at least one Boolean operation, corresponding to said at least one operator, between said plurality of single CHB result sets according to said complex expression, giving rise to a complex expression CHB result set; and
      
      III. fetching key ids from said data blocks according to said complex expression CHB result set, and retrieving corresponding values of said key ids from said keys table, giving rise to a search result of said complex expression.
  - 20. The method of claim 18, further comprising sorting said search result of said searched value V by sorting said single CHB result set pertaining to said searched value V according to an ascending order of values in a given second data column, including:
    - I. for the lowest value V1 in said second data column, calculating said single CHB result set to obtain a single CHB result set pertaining to said value V1;
      
      II. executing Boolen intersection operation between said single CHB result set pertaining to said searched value V and said single CHB result set pertaining to said value V1, giving rise to a part of a sorted CHB result set containing record ids to be listed first in the sorted CHB result set;
      
      III. repeating said steps I and II sequentially for each value Vn in said second data column to obtain said sorted CHB result set; and
      
      IV. fetching key ids from said data blocks according to said sorted CHB result set and retrieving corresponding values of said key ids from said keys table, giving rise to a sorted search result of said searched value V.
  - 21. The method of claim 19, further comprising sorting said search result of said complex expression by sorting said complex expression CHB result set according to an ascending order of values in a given second data column, including:
    - I. for the lowest value V1 in said second data column, calculating said single CHB result set to obtain a single CHB result set pertaining to said value V1;
      
      II. executing Boolen intersection operation between said complex expression CHB result set and said single CHB result set pertaining to said value V1, giving rise to a part of a sorted CHB result set containing record ids to be listed first in the sorted CHB result set;
      
      III. repeating said steps I and II sequentially for each value Vn in said second data column to obtain said sorted CHB result set; and
      
      IV. fetching key ids from said data blocks according to said sorted CHB result set and retrieving corresponding values of said key ids from said keys table, giving rise to a sorted search result of said complex expression.
  - 22. The method of claim 18, further comprising compressing said data columns, including:
    - for each value of each said data column, compressing said key id of said value to a compressed key id using a SHB compression function and saving said compressed key id to a compressed column related to said data column, said compression function based on a serial number of said key id in said SHB data structure representing said field index related to said data column.
  - 23. The method of claim 22, wherein said step II comprises:
    - fetching compressed key ids from said compressed columns, said compressed key ids corresponding to key ids indicated by said single CHB result set, decompressing said compressed key ids to said key ids and retrieving corresponding values of said key ids from said keys table, giving rise to a search result of said searched value V.
  - 24. The method of claim 23, further comprising performing aggregation operation based on said retrieved values.
  - 25. The method of claim 18, further comprising performing deletion instructions, including:
    - I. for each value to be deleted, calculating said single CHB result set;
      
      II. locking and fetching a deletion map when a transaction containing said deletion instructions is committed, wherein said deletion map includes a previous single CHB result set indicating said record ids being deleted from said data blocks;
      
      III. merging said single CHB result set with said deletion map by executing OR operation between said single CHB result set and said previous single CHB result set, giving rise to a new deletion map; and
      
      IV. replacing said deletion map with said new deletion map, confirming commitment of said deletion, and unlocking said new deletion map.
  - 26. The method of claim 25, wherein said searching for said value V in said first data column further comprises:
    - in case said deletion map exists, said merging further comprises fetching said deletion map and executing a Boolean operation NOT between said single CHB result set and said deletion map, giving rise to an updated CHB result set; and
      
      wherein said fetching further comprises fetching said key id K from said data blocks according to said updated CHB result set, and retrieving said value V from said keys table.
  - 27. The method of claim 18, further comprising performing update instructions, including:
    - I. for each value to be updated, calculating said single CHB result set;
      
      II. allocating new record ids to rows containing key ids of the values to be updated;
      
      III. locking and fetching a deletion map when a transaction containing said update instructions is committed, wherein said deletion map includes a previous single CHB result set indicating said record ids being deleted from said data blocks;
      
      IV. storing said rows containing key ids of the values to be updated in an updated data block;
      
      V. merging said single CHB result set with said deletion map by executing OR operation between said CHB result set and said previous single CHB result set, giving rise to a new deletion map; and
      
      VI. replacing said deletion map with said new deletion map, confirming commitment of said rows, and unlocking said new deletion map.

2. A method of identifying a value among a plurality of values, the method comprising:
- (a) providing a first index including a list of unique instances of key ids that represent said plurality of values, said first index being represented by a data structure;
  
  (b) providing a second index including a plurality of ordered lists of record ids indicative of positions of said key ids in a data block, and a position vector that maps said first index to said second index;
  
  (c) searching for a key id that represents said value in said first index in a predetermined time irrespective of size of said first index;
  
  (d) searching in said second index, through said position vector, for at least one record id in an ordered list of said ordered lists of record ids;
  
  said at least one record id indicating a position of said key id in said data block;
  
  the searching in said second index is performed in a predetermined time irrespective of size of said second index; and
  
  (e) fetching the key id from said data block according to said position and retrieving said value according to the fetched key id.
- View Dependent Claims (28, 29)
- - 28. The method of claim 2, wherein said data structure is a SHB data structure, including:
    - (a) at least one word, wherein each said word contains a predefined number of bits, wherein each said bit is selected from the group including 1-bits and 0-bits;
      
      (b) a plurality of bit vectors, each said bit vector containing at least one word, wherein said at least one word is selected from the group including an empty word containing only said 0-bits and a non-empty word containing at least one said 1-bit;
      
      (c) one or more compressed layers representing corresponding one or more non-compressed layers, wherein;
      
      (i) each said non-compressed layer includes one said bit vector, wherein said one or more non-compressed layers are organized sequentially, such that each of said one or more non-compressed layers except for a last non-compressed layer has a subsequent non-compressed layer related thereto;
      
      (ii) each unique instance of a key id in said first index is represented by a 1-bit in the last non-compressed layer and wherein the position of each said 1-bit in said last non-compressed layer is equal to a value of each said unique instance of said key id;
      
      (iii) each said non-empty word is represented by a respective 1-bit in a previous non-compressed layer such that a number of said 1-bits in said previous non-compressed layer is equivalent to a number of said non-empty words in a subsequent non-compressed layer and a position of each of said 1-bit in said previous non-compressed layer represents a corresponding position of each said non-empty word in said subsequent non-compressed layer;
      
      wherein said compressed layers other than a first compressed layer include only said non-empty words, and each position of said empty words in said non-compressed layer is represented by a position of each said 0-bit in said previous noncompressed layer, said empty words in any non-compressed layer being representative of removed empty words in any corresponding compressed layer, and each position of said removed empty words in a second compressed layer is represented by a position of each said 0-bit in said first compressed layer, so that said second compressed layer is decompressed into a second decompressed layer by calculating said positions of said removed empty words in said second compressed layer according to said positions of said 0-bits in said first compressed layer, and each said compressed layer other than said first and second compressed layers is decompressed sequentially by calculating said positions of said removed empty words in each said compressed layer according to said positions of said 0-bits in a previous decompressed layer; and
      
      (d) one or more counter vectors, each of said counter vectors related to each of said one or more compressed layers, wherein for each said word in each of said compressed layers there exists a related counter member and wherein each said counter member holds a counter value which equals a cumulative number of 1-bits, said cumulative number calculated from a first position in each of said bit vectors to each respective said word in said bit vector related to said counter member.
  - 29. The method of claim 2, wherein said position vector includes members each of which being associated with a unique instance of a key id of said first index, and represents a position of an order list of said order lists in said second index.

3. A computerized system comprising:
- a processor; and
  
  a computer-readable non-transient memory in communication with the processor, the memory storing instructions that when executed maintain a database management system that includes;
  
  (a) a data-processing module configured to receive a plurality of rows of values of raw data records and decompose said rows of values into separate fields values,wherein said data-processing module is further configured to assign a numerical value key id to each said field value;
  
  (b) a plurality of Data Blocks each including a plurality of said key ids sequentially stored in rows divided into data columns;
  
  (c) a plurality of logical record ids, each said record id being for each of said rows of key ids, said record id having a value equal to a sequential position of said row of key ids in a data block within said plurality of Data Blocks;
  
  (d) a plurality of said Data Columns, each Data Column holding a plurality of said key ids of said data column of said Data Block;
  
  (e) a plurality of field indexes, each said field index related to a data column of said data columns of said data block and including a list of unique instances of each said key ids;
  
  said data-processing module further configured to covert each said field index to a Super Hierarchical Bitmap (SHB) data structure;
  
  (f) a plurality of inverted index blocks each related to a data block of said data blocks, each said inverted index block including said field indexes and, for each unique instance of each said key id in each said field index, an ordered list of said record ids of said rows in said data block in which said key ids equivalent to said unique instance of said key id are stored; and
  
  (g) a keys table, including a list of one unique instance of each said key id.

4. A non-transitory computer readable storage medium tangibly embodying a data structure capable of being utilized by a processor for database management, said data structure comprising:
- (a) one or more field indexes each represented by a Super Hierarchical Bitmap (SHB) data structure and related to a data column of a data block, each said field index including a list of unique instances of key ids of said data column;
  
  (b) an inverted index block related to said data block, said inverted index block being composed of one or more inverted index vectors each of which related to a data column in said data block and including a plurality of ordered lists of record ids of rows of said data block, each ordered list related to a unique instance of a respective key id in said field index, said rows of said data block holding key ids each having an equivalent unique instance of a key id among said list of unique instances of key ids; and
  
  (c) one or more position vectors each associated with a field index and including members each of which being associated with a unique instance of a key id of said field index, and representing a relative position of an order list of said order lists in said inverted index vector.
- View Dependent Claims (30)
- - 30. The storage medium of claim 4, wherein said SHB data structure representing said field index includes:
    - (a) at least one word, wherein each said word contains a predefined number of bits, wherein each said bit is selected from the group including 1-bits and 0-bits;
      
      (b) a plurality of bit vectors, each said bit vector containing at least one word, wherein said at least one word is selected from the group including an empty word containing only said 0-bits and a non-empty word containing at least one said 1-bit;
      
      (c) one or more compressed layers representing corresponding one or more non-compressed layers, wherein;
      
      (i) each said non-compressed layer includes one said bit vector, wherein said one or more non-compressed layers are organized sequentially, such that each of said one or more non-compressed layers except for a last non-compressed layer has a subsequent non-compressed layer related thereto;
      
      (ii) each said unique instance of each said key id in said field index is represented by a 1-bit in the last non-compressed layer and wherein the position of each said 1-bit in said last non-compressed layer is equal to a value of each said unique instance of each said key id;
      
      (iii) each said non-empty word is represented by a respective 1-bit in a previous non-compressed layer such that a number of said 1-bits in said previous non-compressed layer is equivalent to a number of said non-empty words in a subsequent non-compressed layer and a position of each of said 1-bit in said previous non-compressed layer represents a corresponding position of each said non-empty word in said subsequent non-compressed layer;
      
      wherein said compressed layers other than a first compressed layer include only said non-empty words, and each position of said empty words in said non-compressed layer is represented by a position of each said 0-bit in said previous noncompressed layer, said empty words in any non-compressed layer being representative of removed empty words in any corresponding compressed layer, and each position of said removed empty words in a second compressed layer is represented by a position of each said 0-bit in said first compressed layer, so that said second compressed layer is decompressed into a second decompressed layer by calculating said positions of said removed empty words in said second compressed layer according to said positions of said 0-bits in said first compressed layer, and each said compressed layer other than said first and second compressed layers is decompressed sequentially by calculating said positions of said removed empty words in each said compressed layer according to said positions of said 0-bits in a previous decompressed layer; and
      
      (d) one or more counter vectors, each of said counter vectors related to each of said one or more compressed layers, wherein for each said word in each of said compressed layers there exists a related counter member and wherein each said counter member holds a counter value which equals a cumulative number of 1-bits, said cumulative number calculated from a first position in each of said bit vectors to each respective said word in said bit vector related to said counter member.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Jethrodata Ltd.
Original Assignee
Jethrodata Ltd.
Inventors
Raufman, Boaz
Primary Examiner(s)
AL HASHEMI, SANA A

Application Number

US14/972,478
Publication Number

US 20160103869A1
Time in Patent Office

229 Days
Field of Search

707/609, 707/687, 707/790, 707/821, 707/953
US Class Current

1/1
CPC Class Codes

G06F 16/2272   Management thereof

G06F 16/2365   Ensuring data consistency a...

G06F 16/901   Indexing; Data structures t...

G06F 7/78   for changing the order of d...

System, method and data structure for fast loading, storing and access to huge data sets in real time

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

System, method and data structure for fast loading, storing and access to huge data sets in real time

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links