System, method and data structure for fast loading, storing and access to huge data sets in real time
First Claim
1. A computerized system comprising:
- a processor; and
a computer-readable non-transient memory in communication with the processor, the memory storing instructions that when executed manage a Super Hierarchiel Bitmaps (SHB) data structure representing a set of integers that includes;
(a) at least one word, wherein each said word contains a predefined number of, bits, wherein each said bit is selected from the group including 1-bits and 0-bits;
(b) a plurality of bit vectors, each said bit vector containing at least one word, wherein said at least one word is selected from the group including an empty word containing only said 0-bits and a non-empty word containing at least one said 1-bit;
(c) one or more compressed layers representing corresponding one or more non-compressed layers, wherein;
(i) each said non-compressed layer includes one said bit vector, wherein said one or more non-compressed layers are organized sequentially, such that each of said one or more non-compressed layers except for a last non-compressed layer has a subsequent non-compressed layer related thereto;
(ii) the set of integers containing a plurality of positive integer members, wherein each member is represented by a 1-bit in the last non-compressed layer and wherein the position of each said 1-bit in said last non-compressed layer is equal to a value of said positive integer member;
(iii) each said non-empty word is represented by a respective 1-bit in a previous non-compressed layer such that a number of said 1-bits in said previous non-compressed layer is equivalent to a number of said non-empty words in a subsequent non-compressed layer and a position of each of said 1-bit in said previous non-compressed layer represents a corresponding position of each said non-empty word in said subsequent non-compressed layer;
wherein said compressed layers other than a first compressed layer include only said non-empty words, and each position of said empty words in said non-compressed layer is represented by a position of each said 0-bit in said previous non-compressed layer, said empty words in any non-compressed layer being representative of removed empty words in any corresponding compressed layer, and each position of said removed empty words in a second compressed layer is represented by a position of each said 0-bit in said first compressed layer, so that said second compressed layer is decompressed into a second decompressed layer by calculating said positions of said removed empty words in said second compressed layer according to said positions of said 0-bits in said first compressed layer, and each said compressed layer other than said first and second compressed layers is decompressed sequentially by calculating said positions of said removed empty words in each said compressed layer according to said positions of said 0-bits in a previous decompressed layer; and
(d) one or more counter vectors, each of said counter vectors related to each of said one or more compressed layers, wherein for each said word in each of said compressed layers there exists a related counter member and wherein each said counter member holds a counter value which equals a cumulative number of 1-bits, said cumulative number calculated from a first position in each of said bit vectors to each respective said word in said bit vector related to said counter member.
2 Assignments
0 Petitions
Accused Products
Abstract
A computerized system including a processor and a computer-readable non-transient memory in communication with the processor, the memory storing instructions that when executed manage a novel data structure and related group of algorithms that can be used as a method for representing a set and as a base for very efficient indexing, hash and compression. SHB is an improvement of hierarchical bitmap. An improved database system that can utilize the innovative data structure which includes a raw data stream provided to the system via a data processing module, data blocks, fields indexes tables and a keys table. There is provided an index creating process and a columns creating process, for transforming the data blocks and tables into index blocks and data columns.
6 Citations
25 Claims
-
1. A computerized system comprising:
-
a processor; and a computer-readable non-transient memory in communication with the processor, the memory storing instructions that when executed manage a Super Hierarchiel Bitmaps (SHB) data structure representing a set of integers that includes; (a) at least one word, wherein each said word contains a predefined number of, bits, wherein each said bit is selected from the group including 1-bits and 0-bits; (b) a plurality of bit vectors, each said bit vector containing at least one word, wherein said at least one word is selected from the group including an empty word containing only said 0-bits and a non-empty word containing at least one said 1-bit; (c) one or more compressed layers representing corresponding one or more non-compressed layers, wherein; (i) each said non-compressed layer includes one said bit vector, wherein said one or more non-compressed layers are organized sequentially, such that each of said one or more non-compressed layers except for a last non-compressed layer has a subsequent non-compressed layer related thereto; (ii) the set of integers containing a plurality of positive integer members, wherein each member is represented by a 1-bit in the last non-compressed layer and wherein the position of each said 1-bit in said last non-compressed layer is equal to a value of said positive integer member; (iii) each said non-empty word is represented by a respective 1-bit in a previous non-compressed layer such that a number of said 1-bits in said previous non-compressed layer is equivalent to a number of said non-empty words in a subsequent non-compressed layer and a position of each of said 1-bit in said previous non-compressed layer represents a corresponding position of each said non-empty word in said subsequent non-compressed layer; wherein said compressed layers other than a first compressed layer include only said non-empty words, and each position of said empty words in said non-compressed layer is represented by a position of each said 0-bit in said previous non-compressed layer, said empty words in any non-compressed layer being representative of removed empty words in any corresponding compressed layer, and each position of said removed empty words in a second compressed layer is represented by a position of each said 0-bit in said first compressed layer, so that said second compressed layer is decompressed into a second decompressed layer by calculating said positions of said removed empty words in said second compressed layer according to said positions of said 0-bits in said first compressed layer, and each said compressed layer other than said first and second compressed layers is decompressed sequentially by calculating said positions of said removed empty words in each said compressed layer according to said positions of said 0-bits in a previous decompressed layer; and (d) one or more counter vectors, each of said counter vectors related to each of said one or more compressed layers, wherein for each said word in each of said compressed layers there exists a related counter member and wherein each said counter member holds a counter value which equals a cumulative number of 1-bits, said cumulative number calculated from a first position in each of said bit vectors to each respective said word in said bit vector related to said counter member. - View Dependent Claims (7, 8, 9, 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
2. A computerized method of searching for an integer member U, comprising:
-
A. providing a Super Hierarchiel Bitmaps (SHB) data structure representing a set of integer members, the SHB data structure being stored on a computer-readable non-transient memory in communication with a processor, the SHB data structure including; (a) at least one word, wherein each said word contains a predefined number of bits, wherein each said bit is selected from the group including 1-bits and 0-bits; (b) a plurality of bit vectors, each said bit vector containing at least one word, wherein said at least one word is selected from the group including an empty word containing only said 0-bits and a non-empty word containing at least one said 1-bit; (c) one or more compressed layers representing corresponding one or more non-compressed layers, wherein; (i) each said non-compressed layer includes one said bit vector, wherein said one or more non-compressed layers are organized sequentially, such that each of said one or more non-compressed layers except for a last non-compressed layer has a subsequent non-compressed layer related thereto; (ii) the set of integers containing a plurality of positive integer members, wherein each member is represented by a 1-bit in the last non-compressed layer and wherein the position of each said 1-bit in said last non-compressed layer is equal to a value of said positive integer member; (iii) each said non-empty word is represented by a respective 1-bit in a previous non-compressed layer such that a number of said 1-bits in said previous non-compressed layer is equivalent to a number of said non-empty words in a subsequent non-compressed layer and a position of each of said 1-bit in said previous non-compressed layer represents a corresponding position of each said non-empty word in said subsequent non-compressed layer; wherein said compressed layers other than a first compressed layer include only said non-empty words, and each position of said empty words in said non compressed layer is represented by a position of each said 0-bit in said previous non-compressed layer, said empty words in any non-compressed layer being representative of removed empty words in any corresponding compressed layer, and each position of said removed empty words in a second compressed layer is represented by a position of each said 0-bit in said first compressed layer, so that said second compressed layer is decompressed into a second decompressed layer by calculating said positions of said removed empty words in said second compressed layer according to said positions of said 0-bits in said first compressed layer, and each said compressed layer other than said first and second compressed layers is decompressed sequentially by calculating said positions of said removed empty words in each said compressed layer according to said positions of said 0-bits in a previous decompressed layer; and (d) one or more counter vectors, each of said counter vectors related to each of said one or more compressed layers, wherein for each said word in each of said compressed layers there exists a related counter member and wherein each said counter member holds a counter value which equals a cumulative number of 1-bits, said cumulative number calculated from a first position in each of said bit vectors to each respective said word in said bit vector related to said counter member; B. searching for said integer member U represented by a bit representation in a last compressed layer in said SHB data structure using a SHB Search function. - View Dependent Claims (10, 15)
-
-
3. A computerized method of creating a Super Hierarchiel Bitmaps (SHB) data structure representing a set of integer members using a SHB Create function, the SHB data structure being stored on a computer-readable non-transient memory in communication with a processor, comprising:
-
A. obtaining said set of integer members ordered in an ascending order from the lowest value to the highest value; B. creating said SHB data structure that represents said set of integer members, said SHB data structure includes; (a) at least one word, wherein each said word contains a predefined number of bits, wherein each said bit is selected from the group including 1-bits and 0-bits; (b) a plurality of bit vectors, each said bit vector containing at least one word, wherein said at least one word is selected from the group including an empty word containing only said 0-bits and a non-empty word containing at least one said 1-bit; (c) one or more compressed layers representing corresponding one or more non-compressed layers, wherein; (i) each said non-compressed layer includes one said bit vector, wherein said one or more non-compressed layers are organized sequentially, such that each of said one or more non-compressed layers except for a last non-compressed layer has a subsequent non-compressed layer related thereto; (ii) the set of integers containing a plurality of positive integer members, wherein each member is represented by a 1-bit in the last non-compressed layer and wherein the position of each said 1-bit in said last non-compressed layer is equal to a value of said positive integer member; (iii) each said non-empty word is represented by a respective 1-bit in a previous non-compressed layer such that a number of said 1-bits in said previous non-compressed layer is equivalent to a number of said non-empty words in a subsequent non-compressed layer and a position of each of said 1-bit in said previous non-compressed layer represents a corresponding position of each said non-empty word in said subsequent non-compressed layer; wherein said compressed layers other than a first compressed layer include only said non-empty words, and each position of said empty words in said non compressed layer is represented by a position of each said 0-bit in said previous non-compressed layer, said empty words in any non-compressed layer being representative of removed empty words in any corresponding compressed layer, and each position of said removed empty words in a second compressed layer is represented by a position of each said 0-bit in said first compressed layer, so that said second compressed layer is decompressed into a second decompressed layer by calculating said positions of said removed empty words in said second compressed layer according to said positions of said 0-bits in said first compressed layer, and each said compressed layer other than said first and second compressed layers is decompressed sequentially by calculating said positions of said removed empty words in each said compressed layer according to said positions of said 0-bits in a previous decompressed layer; and (d) one or more counter vectors, each of said counter vectors related to each of said one or more compressed layers, wherein for each said word in each of said compressed layers there exists a related counter member and wherein each said counter member holds a counter value which equals a cumulative number of 1-bits, said cumulative number calculated from a first position in each of said bit vectors to each respective said word in said bit vector related to said counter member.
-
-
4. A computerized method of creating a Super Hierarchiel Bitmaps (SHB) index using an SHB data structure, the SHB data structure representing a set of integer members, said SHB data structure being stored on a computer-readable non-transient memory in communication with a processor, comprising:
-
A. providing index pairs each including a set member constituting an integer member, and a mapped value related to said set member; B. arranging said index pairs to an ordered set of index pairs according to an ascending order of integer members of said index pairs, wherein said integer members of said ordered set of index pairs being representatives of said set of integer members ordered in said ascending order; C. creating said SHB data structure that represents said set of integer members, said SHB data structure includes; (a) at least one word, wherein each said word contains a predefined number of bits, wherein each said bit is selected from the group including 1-bits and 0-bits; (b) a plurality of bit vectors, each said bit vector containing at least one word, wherein said at least one word is selected from the group including an empty word containing only said 0-bits and a non-empty word containing at least one said 1-bit; (c) one or more compressed layers representing corresponding one or more non-compressed layers, wherein; (i) each said non-compressed layer includes one said bit vector, wherein said one or more non-compressed layers are organized sequentially, such that each of said one or more non-compressed layers except for a last non-compressed layer has a subsequent non-compressed layer related thereto; (ii) the set of integers containing a plurality of positive integer members, wherein each member is represented by a 1-bit in the last non-compressed layer and wherein the position of each said 1-bit in said last non-compressed layer is equal to a value of said positive integer member; (iii) each said non-empty word is represented by a respective 1-bit in a previous non-compressed layer such that a number of said 1-bits in said previous non-compressed layer is equivalent to a number of said non-empty words in a subsequent non-compressed layer and a position of each of said 1-bit in said previous non-compressed layer represents a corresponding position of each said non-empty word in said subsequent non-compressed layer; wherein said compressed layers other than a first compressed layer include only said non-empty words, and each position of said empty words in said non compressed layer is represented by a position of each said 0-bit in said previous non-compressed layer, said empty words in any non-compressed layer being representative of removed empty words in any corresponding compressed layer, and each position of said removed empty words in a second compressed layer is represented by a position of each said 0-bit in said first compressed layer, so that said second compressed layer is decompressed into a second decompressed layer by calculating said positions of said removed empty words in said second compressed layer according to said positions of said 0-bits in said first compressed layer, and each said compressed layer other than said first and second compressed layers is decompressed sequentially by calculating said positions of said removed empty words in each said compressed layer according to said positions of said 0-bits in a previous decompressed layer; and (d) one or more counter vectors, each of said counter vectors related to each of said one or more compressed layers, wherein for each said word in each of said compressed layers there exists a related counter member and wherein each said counter member holds a counter value which equals a cumulative number of 1-bits, said cumulative number calculated from a first position in each of said bit vectors to each respective said word in said bit vector related to said counter member; D. allocating a value vector V associated with said created SHB data structure, wherein each member of said value vector V[n] holds a mapped value related to a respective integer member of a nth index pair of said ordered set of index pairs.
-
-
5. A computerized method of creating hash for a plurality of keys in a key set using a Super Hierarchiel Bitmaps (SHB) index based on a SHB data structure, the SHB data structure representing a set of integer members, said SHB data structure being stored on a computer-readable non-transient memory in communication with a processor, comprising:
-
A. hash said keys with a hash function ƒ
to respective hashed values;B. collect said keys in a key vector K and said respective hashed values thereof in a hashed value vector H, so that a position m of a key k in said key vector is equal to a corresponding position m of a respective hashed value of said key k in said hashed value vector, such that H[m]=ƒ
(K[m]);C. create said SHB data structure representing said hashed values in said hashed value vector H, wherein the hashed values being representatives of said set of integer members, said SHB data structure includes; (a) at least one word, wherein each said word contains a predefined number of bits, wherein each said bit is selected from the group including 1-bits and 0-bits; (b) a plurality of bit vectors, each said bit vector containing at least one word, wherein said at least one word is selected from the group including an empty word containing only said 0-bits and a non-empty word containing at least one said 1-bit; (c) one or more compressed layers representing corresponding one or more non-compressed layers, wherein; (i) each said non-compressed layer includes one said bit vector, wherein said one or more non-compressed layers are organized sequentially, such that each of said one or more non-compressed layers except for a last non-compressed layer has a subsequent non-compressed layer related thereto; (ii) the set of integers containing a plurality of positive integer members, wherein each member is represented by a 1-bit in the last non-compressed layer and wherein the position of each said 1-bit in said last non-compressed layer is equal to a value of said positive integer member; (iii) each said non-empty word is represented by a respective 1-bit in a previous non-compressed layer such that a number of said 1-bits in said previous non-compressed layer is equivalent to a number of said non-empty words in a subsequent non-compressed layer and a position of each of said 1-bit in said previous non-compressed layer represents a corresponding position of each said non-empty word in said subsequent non-compressed layer; wherein said compressed layers other than a first compressed layer include only said non-empty words, and each position of said empty words in said non compressed layer is represented by a position of each said 0-bit in said previous non-compressed layer, said empty words in any non-compressed layer being representative of removed empty words in any corresponding compressed layer, and each position of said removed empty words in a second compressed layer is represented by a position of each said 0-bit in said first compressed layer, so that said second compressed layer is decompressed into a second decompressed layer by calculating said positions of said removed empty words in said second compressed layer according to said positions of said 0-bits in said first compressed layer, and each said compressed layer other than said first and second compressed layers is decompressed sequentially by calculating said positions of said removed empty words in each said compressed layer according to said positions of said 0-bits in a previous decompressed layer; and (d) one or more counter vectors, each of said counter vectors related to each of said one or more compressed layers, wherein for each said word in each of said compressed layers there exists a related counter member and wherein each said counter member holds a counter value which equals a cumulative number of 1-bits, said cumulative number calculated from a first position in each of said bit vectors to each respective said word in said bit vector related to said counter member; D. create a position vector V, wherein V[s] equals the position m of said key k in said key vector K, such that K[V[s]]=k, wherein s indicates a serial number of a respective hashed value H[m] in said SHB data structure; whereby the hashed values in said SHB data structure being representatives of said integer members, and V[s] being representatives of said mapped values related to said respective integer members.
-
-
6. A computer program product that includes a computer-readable non-transient memory storing instructions for performing a method of searching for an integer member U, comprising:
-
A. providing a Super Hierarchiel Bitmaps (SHB) data structure representing a set of integer members, the SHB data structure being stored on a computer-readable non-transient memory in communication with a processor, the SHB data structure including; (a) at least one word, wherein each said word contains a predefined number of bits, wherein each said bit is selected from the group including 1-bits and 0-bits; (b) a plurality of bit vectors, each said bit vector containing at least one word, wherein said at least one word is selected from the group including an empty word containing only said 0-bits and a non-empty word containing at least one said 1-bit; (c) one or more compressed layers representing corresponding one or more non-compressed layers, wherein; (i) each said non-compressed layer includes one said bit vector, wherein said one or more non-compressed layers are organized sequentially, such that each of said one or more non-compressed layers except for a last non-compressed layer has a subsequent non-compressed layer related thereto; (ii) the set of integers containing a plurality of positive integer members, wherein each member is represented by a 1-bit in the last non-compressed layer and wherein the position of each said 1-bit in said last non-compressed layer is equal to a value of said positive integer member; (iii) each said non-empty word is represented by a respective 1-bit in a previous non-compressed layer such that a number of said 1-bits in said previous non-compressed layer is equivalent to a number of said non-empty words in a subsequent non-compressed layer and a position of each of said 1-bit in said previous non-compressed layer represents a corresponding position of each said non-empty word in said subsequent non-compressed layer; wherein said compressed layers other than a first compressed layer include only said non-empty words, and each position of said empty words in said non compressed layer is represented by a position of each said 0-bit in said previous non-compressed layer, said empty words in any non-compressed layer being representative of removed empty words in any corresponding compressed layer, and each position of said removed empty words in a second compressed layer is represented by a position of each said 0-bit in said first compressed layer, so that said second compressed layer is decompressed into a second decompressed layer by calculating said positions of said removed empty words in said second compressed layer according to said positions of said 0-bits in said first compressed layer, and each said compressed layer other than said first and second compressed layers is decompressed sequentially by calculating said positions of said removed empty words in each said compressed layer according to said positions of said 0-bits in a previous decompressed layer; and (d) one or more counter vectors, each of said counter vectors related to each of said one or more compressed layers, wherein for each said word in each of said compressed layers there exists a related counter member and wherein each said counter member holds a counter value which equals a cumulative number of 1-bits, said cumulative number calculated from a first position in each of said bit vectors to each respective said word in said bit vector related to said counter member; searching for said integer member U represented by a bit representation in a last compressed layer in said SHB data structure using a SHB Search function.
-
Specification