Indexed file system and a method and a mechanism for accessing data records from such a system
First Claim
1. A data access mechanism for a computer filing system, the access mechanism including:
- an extensible hierarchical directory having a plurality of directory entries at least some of which provide pointers to respective indexed data files;
logic for analysing a data key input to said filing system, by comparison of the data key with entries in the extensible hierarchical directory, to identify one of said plurality of directory entries providing a pointer to a respective indexed data file of the respective indexed data files;
a plurality of indexed data files for storing data; and
a plurality of indices, each corresponding to one of said indexed data files;
wherein each of said indexed data files is locatable via one of said pointers, wherein the index for each of said indexed data files contains identifiers and storage addresses of data blocks therein, each of said indexed data files having associated logic for transforming an input data key to generate a transformed key and logic for comparing a transformed key with data block identifiers in the index to identify a match and to obtain the address of an identified data block within the indexed data file, and each of said indexed data files having associated logic for identifying from records within the identified data block one or more data records related to the input data key.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer filing system includes a data access and allocation mechanism including a directory and a plurality of indexed data files or hash tables. The directory is preferably a radix tree including directory entries which contain pointers to respective ones of the hash tables. Using a plurality of hash tables avoids the whole database ever having to be re-hashed all at once. If a hash table exceeds a preset maximum size as data is added, it is replaced by two hash tables and the directory is updated to include two separate directory entries each containing a pointer to one of the new hash tables. The directory is locally extensible such that new levels are added to the directory only where necessary to distinguish between the hash tables. Local extensibility prevents unnecessary expansion of the size of the directory while also allowing the size of the hash tables to be controlled. This allows optimisation of the data access mechanism such that an optimal combination of directory-look-up and hashing processes is used. Additionally, if the number of keys mapped to an indexed data file is less than a threshold number (corresponding to the number of entries which can be held in a reasonable index), the index for the data file is built with a one-to-one relationship between keys and index entries such that each index entry identifies a data block holding data for only one key. This avoids the overhead of the collision detection of hashing when it ceases to be useful.
-
Citations
12 Claims
-
1. A data access mechanism for a computer filing system, the access mechanism including:
-
an extensible hierarchical directory having a plurality of directory entries at least some of which provide pointers to respective indexed data files;
logic for analysing a data key input to said filing system, by comparison of the data key with entries in the extensible hierarchical directory, to identify one of said plurality of directory entries providing a pointer to a respective indexed data file of the respective indexed data files;
a plurality of indexed data files for storing data; and
a plurality of indices, each corresponding to one of said indexed data files;
wherein each of said indexed data files is locatable via one of said pointers, wherein the index for each of said indexed data files contains identifiers and storage addresses of data blocks therein, each of said indexed data files having associated logic for transforming an input data key to generate a transformed key and logic for comparing a transformed key with data block identifiers in the index to identify a match and to obtain the address of an identified data block within the indexed data file, and each of said indexed data files having associated logic for identifying from records within the identified data block one or more data records related to the input data key. - View Dependent Claims (2, 3, 4)
the directory is an hierarchical directory comprising a plurality of branch nodes pointing to other nodes to define paths through the directory from a root node to a plurality of leaf nodes each forming the lowest level of the hierarchy for a respective path, each leaf node providing a pointer to an indexed data file; and
the logic for analysing an input data key by comparison with the directory is adapted to iteratively analyse the input data key by comparison of the key with directory entries at each successive node of a path through the directory to select a branch to a next node leading to a respective leaf node, each iteration of the analysis at each successive node using one or more additional bits of information from the input key to distinguish between keys and select a branch.
-
-
3. A data access mechanism according to claim 1, wherein:
-
the indexed data files each have a predefined maximum size and are extensible up to said predefined maximum; and
the data access mechanism includes logic for identifying indexed data files exceeding the predefined maximum size and for responding to a first indexed data file exceeding the predefined maximum size by replacing the first indexed data file with two or more separate indexed data files and by extending the hierarchical directory by addition of new directory nodes branching from the node of the directory which previously identified the pointer to the first indexed data file, the new nodes forming a new lowest level of the directory hierarchy for the respective branch, such that the extended directory is adapted to use one or more additional bits of information from input data keys to distinguish between the keys and select a branch to identify a respective pointer to one of said two or more replacement indexed data files.
-
-
4. A data access mechanism according to claim 3, wherein said logic for replacing the first indexed data file is adapted to determine when an array of keys mapped to a replacement indexed data file includes no more than a threshold number of keys and, in response to said determination, to build the index for said replacement indexed data file such that each key within said array of keys corresponds to a separate data block within said file.
-
5. An indexed file system including a data processing system having a processor, memory, peripheral storage, and communication buses for transferring data between said processor, memory and peripheral storage, said indexed file system also including a data allocation and access mechanism comprising:
-
an extensible hierarchical directory including a plurality of directory entries at least some of which provide pointers to respective indexed data files;
logic for use by said processor to analyse a data key input to said filing system, by comparison of the data key with entries in the extensible hierarchical directory, to identify one of said plurality of directory entries providing a pointer to a respective indexed data file of the respective indexed data files;
a plurality of indexed data files for storing data in said peripheral storage; and
a plurality of indices, each corresponding to one of said indexed data files;
wherein each of said indexed data files is locatable via one of said pointers, wherein the index for each of said indexed data file, contains identifiers and storage addresses of data blocks within the indexed data file, each of said indexed data files having associated logic for use by said processor to transform an input data key to generate a transformed key and logic for use by said processor to compare the transformed key with data block identifiers in the index to identify a match and to obtain the address of an identified data block within the indexed data file, and each of said indexed data files having associated logic for use by said processor to identify from records within the identified data block one or more data records related to the input data key.
-
-
6. A filing system for a computer including:
-
a directory having a plurality of directory entries at least some of which contain pointers to respective indexed data files;
logic for analysing a data key input to said filing system, by comparison of the data key with entries in the directory, to identify one of the plurality of directory entries providing a pointer to a respective indexed data file for storing data relating to the input key;
a plurality of indexed data files for storing data, each locatable via one of said pointers; and
a plurality of indices, each corresponding to one of said indexed data files;
wherein the index for each indexed data file contains identifiers and storage addresses of data blocks within the indexed data file and has associated logic for comparing an input data key with data block identifiers in the index to identify a match and to obtain the address of an identified data block within the indexed data file; and
wherein the filing system includes logic for;
(a) identifying indexed data files which exceed a preset maximum file size; and
(b) replacing said excessive-sized indexed data files with a plurality of indexed data files, including updating the directory to include separate directory entries each containing a pointer to one of said replacement indexed data files. - View Dependent Claims (7)
-
-
8. A method of accessing data records in a computer file system using an access mechanism the method comprising the steps of:
-
providing an extensible hierarchical directory having a plurality of directory entries at least some of which provide pointers to respective indexed data files;
limiting the respective indexed data files to a predefined maximum size which are extensible up to said predefined maximum;
analysing a data key input to said file system by comparing the data key with entries in the extensible hierarchical directory in order to identify one of said plurality of directory entries which provide a pointer to a respective indexed data file;
storing data in a plurality of indexed data files;
providing a plurality of indices each corresponding to one of said indexed data files;
locating each of said indexed data files via one of said pointers, the index for each of said indexed data files contains identifiers and storage addresses of data blocks within the indexed data file;
transforming the input data key to generate a transformed key;
comparing the transformed key with data block identifiers in the index to identify a match and to obtain the address of an identified data block within each of said indexed data files;
identifying from records within an identified data block one or more data records related to the input data key;
analysing an input data key, in response to receipt of the input data key, by comparison with entries in the directory to identify a directory entry providing a pointer to a respective indexed data file of the indexed data files;
accessing the index for said respective indexed data file;
transforming the input data key to generate the transformed key which is compared with data block identifiers in the index of the indexed data file;
comparing the transformed key with data block identifiers in said index to identify a match;
obtaining from said index the address of the identified data block;
identifying from records within the identified data block one or more records related to the input data key.
-
-
9. A method of reorganising a database including reorganising an access mechanism comprising the steps of:
-
providing an extensible hierarchical directory comprising a plurality of branch nodes pointing to other nodes to define paths throuth the directory from a root node to a plurality of leaf nodes each forming the lowest level of the hierarchy for a respective path, each leaf node providing a pointer to an indexed data file;
analysing a data key input to said file system by comparing the data key with entries in the extensible hierarchical directory in order to identify one of said plurality of directory entries which provide a pointer to a respective indexed data file;
storing data in a plurality of indexed data files;
providing a plurality of indices each corresponding to one of said indexed data files;
locating each of said indexed data files via one of said pointers, the index for each of said indexed data files contains identifiers and storage addresses of data blocks within the indexed data file;
transforming the input data key to generate a transformed key;
comparing the transformed key with data block identifiers in the index to identify a match and to obtain the address of an identified data block within each of said indexed data files;
identifying from records within an identified data block one or more data records related to the input data key; and
analysing an input data key by comparison with the extensible hierarchical directory by adapting to iteratively analyse the input data key by comparison of the key with directory entries at each successive node of a path through the directory to select a branch to a next node leading to a respective leaf node, each iteration of the analysis at each successive node using one or more additional bits of information from the input key to distinguish between keys and select a branch;
periodically scanning said indexed data files to determine the size of each indexed data file;
comparing the size of each indexed data file with a preset file size limit to identify indexed data files larger than said preset size limit; and
for indexed data files larger than said preset size limit, replacing the indexed data file with two or more separate indexed data files and extending the hierarchical directory by addition of new directory nodes branching from the node of the directory which previously identified the pointer to the first indexed data file, the new nodes forming a new lowest level of the directory hierarchy for the respective branch, the extended directory then using one or more additional bits of information from input data keys to distinguish between the keys and select a branch to identify a respective pointer to one of said two or more replacement indexed data files. - View Dependent Claims (10)
determining when an array of keys mapped to a replacement indexed data file includes no more than a threshold number of keys and, in response to said determination, building the index for said replacement indexed data file such that each key within said array of keys corresponds to a separate data block within said file.
-
-
11. A computer program product comprising computer program code recorded on a machine readable recording medium, the program code including instructions for implementing the steps of:
-
providing an extensible hierarchical directory having a plurality of directory entries at least some of which provide pointers to respective indexed data files;
limiting the respective indexed data files to a predefined maximum size which are extensible up to said predefined maximum;
analysing a data key input to said file system by comparing the data key with entries in the directory in order to identify one of said plurality of directory entries which provide a pointer to a respective indexed data file;
storing data in a plurality of indexed data files of the indexed data files;
providing a plurality of indices each corresponding to one of said indexed data files;
locating each of said indexed data files via one of said pointers, the index for each indexed data file contains identifiers and storage addresses of data blocks within the indexed data file;
transforming the input data key to generate a transformed key and logic for comparing a transformed key with data block identifiers in the index to identify a match and to obtain the address of an identified data block within the indexed data file;
identifying from records within an identified data block one or more data records related to the input data key;
analysing an input data key, in response to receipt of the input data key, by comparison with entries in the directory to identify a directory entry providing a pointer to a respective indexed data file of the indexed data files;
accessing the index for said respective indexed data file;
transforming the input data key to generate a transformed key which is compared with data block identifiers in the index of the indexed data file;
comparing the transformed key with data block identifiers in said index to identify a match;
obtaining from said index the address of the identified data block;
identifying from records within the identified data block one or more records related to the input data key. - View Dependent Claims (12)
identifying indexed data files exceeding the predefined maximum size;
responding to a first indexed data file exceeding the predefined maximum size by replacing the first indexed data file with two or more separate indexed data files and by extending the hierarchical directory by addition of new directory nodes branching from the node of the directory which previously identified the pointer to the first indexed data file; and
forming a new lowest level of the directory hierarchy nodes for the respective branch with the new, such that the extended directory is adapted to use one or more additional bits of information from input data keys to distinguish between the keys and select a branch to identify a respective pointer to one of said two or more replacement indexed data files.
-
Specification