Databases
First Claim
1. A method of constructing a computer database for storing in compressed form information comprising a plurality of tuples, wherein each tuple comprises a data value in each of a plurality of fields with corresponding fields in the tuples constituting a column, the method comprising the steps of:
- (a) defining a plurality of domains;
(b) receiving said information in an uncompressed form and assigning each data value into a corresponding domain, wherein data values in the same field are assigned to the same domain;
(c) generating for each domain a domain dictionary which matches each distinct data value assigned to that domain with a corresponding distinct token, and storing each domain dictionary in a domain-dictionary store;
(d) creating a tokenised store for each column and storing therein the corresponding distinct token for each pertaining field data value;
wherein the method further comprises arranging that;
(e) all of the tokens for a domain are initially of the same size and that size being substantially the minimum size necessary to provide the required plurality of distinct tokens for the initial size of that domain;
(f) when further tuples are received for storage for each field of the tuple the data value is compared in the corresponding domain dictionary and if no match is found a new token within the existing token size is generated and added to the domain dictionary but if all tokens within the existing token size are utilised a new token is generated having a size which is at least 1-bit wider than previously, such broadened tokens being entered into a new tokenised store for that column; and
(g) the tokenised stores and the domain dictionary stores are arranged in the computer memory as respective data blocks or respective sets of data blocks which are independently relocatable within the computer memory whereby ones of the data blocks may be modified in size to accommodate such broadened tokens without requiring consequential modifications to be made to others of the data blocks.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer database for storing information in tuple form wherein each tuple comprises fields with corresponding fields in the tuples constituting a column, generates for each data domain a domain dictionary which matches each distinct data value in that domain with a corresponding distinct token, and stores each domain dictionary. A tokenised store for each data field column is created. All of the tokens for a domain are initially of the same size being substantially the minimum size necessary to provide the required plurality of distinct tokens for initialize size of that domain. When further tuples are received for storage for each field of the tuple, the data value is compared in the corresponding domain dictionary. If no match is found, a new token within the existing token size is generated and added to the domain dictionary, but if all tokens within the existing token size are utilised a new token is generated having a size which is at least 1-bit wider than previously, such broadened token being entered into a new tokenised store for that column.
37 Citations
7 Claims
-
1. A method of constructing a computer database for storing in compressed form information comprising a plurality of tuples, wherein each tuple comprises a data value in each of a plurality of fields with corresponding fields in the tuples constituting a column, the method comprising the steps of:
-
(a) defining a plurality of domains;
(b) receiving said information in an uncompressed form and assigning each data value into a corresponding domain, wherein data values in the same field are assigned to the same domain;
(c) generating for each domain a domain dictionary which matches each distinct data value assigned to that domain with a corresponding distinct token, and storing each domain dictionary in a domain-dictionary store;
(d) creating a tokenised store for each column and storing therein the corresponding distinct token for each pertaining field data value;
wherein the method further comprises arranging that;
(e) all of the tokens for a domain are initially of the same size and that size being substantially the minimum size necessary to provide the required plurality of distinct tokens for the initial size of that domain;
(f) when further tuples are received for storage for each field of the tuple the data value is compared in the corresponding domain dictionary and if no match is found a new token within the existing token size is generated and added to the domain dictionary but if all tokens within the existing token size are utilised a new token is generated having a size which is at least 1-bit wider than previously, such broadened tokens being entered into a new tokenised store for that column; and
(g) the tokenised stores and the domain dictionary stores are arranged in the computer memory as respective data blocks or respective sets of data blocks which are independently relocatable within the computer memory whereby ones of the data blocks may be modified in size to accommodate such broadened tokens without requiring consequential modifications to be made to others of the data blocks. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
Specification