Methods of organizing data and processing queries in a database system, and database system and software product for implementing such methods
First Claim
1. A method of organizing information in a database system, wherein a group of attributes is defined and words of a collection of data are assigned to said attributes, wherein the group of attributes is divided into a plurality of sub-groups each associated with a respective data table, each data table having a column for each attribute of the associated sub-group and rows for containing data table records comprising at least one word assigned to an attribute of the associated sub-group, wherein links are defined between the data tables records, each link having a target table and a corresponding source table having a link column containing link values each designating a record of said target table, whereby each of said link values represents a link between the record of the source table including said link value and the record of the target table designated by said link value, the method comprising the steps of:
- allocating respective identifiers to data graphs, wherein each data graph represents related attribute values respectively assigned to the attributes of said group, wherein each attribute value of a data graph is either a default value or a word of said collection of data, and wherein the words of each data graph are from linked data table records;
storing a plurality of word thesauruses respectively associated with attributes of said group, wherein for each word assigned at least once to an attribute in the collection of data, the word thesaurus associated with said attribute has a respective entry containing said word; and
storing data representing data graph identifier lists respectively associated with the word thesaurus entries, wherein the data graph identifier list associated with a thesaurus entry relating to a word assigned to an attribute includes any identifier allocated to a data graph having said word assigned to said attribute.
3 Assignments
0 Petitions
Accused Products
Abstract
A reference table has columns associated with data attributes and rows containing related words assigned to those attributes in a collection of data, those words coming from different data tables having independent numbers of records. The stored data include word thesauruses associated with the attributes, and reference table row identifier lists respectively associated with thesaurus entries. Each word thesaurus associated with an attribute has a respective entry for each word assigned to this data attribute in the collection of data. The reference table, which may be a virtual table, defines a unified algebraic framework for the entries of all the thesauruses. Query criteria can be examined with reference to the relevant thesauruses to obtain a row-ID list or bitmap vector which represents all the reference table rows matching the query criteria, if any. The results can then be delivered through the original data tables or, preferably, by means of the thesauruses.
234 Citations
201 Claims
-
1. A method of organizing information in a database system, wherein a group of attributes is defined and words of a collection of data are assigned to said attributes, wherein the group of attributes is divided into a plurality of sub-groups each associated with a respective data table, each data table having a column for each attribute of the associated sub-group and rows for containing data table records comprising at least one word assigned to an attribute of the associated sub-group, wherein links are defined between the data tables records, each link having a target table and a corresponding source table having a link column containing link values each designating a record of said target table, whereby each of said link values represents a link between the record of the source table including said link value and the record of the target table designated by said link value, the method comprising the steps of:
-
allocating respective identifiers to data graphs, wherein each data graph represents related attribute values respectively assigned to the attributes of said group, wherein each attribute value of a data graph is either a default value or a word of said collection of data, and wherein the words of each data graph are from linked data table records;
storing a plurality of word thesauruses respectively associated with attributes of said group, wherein for each word assigned at least once to an attribute in the collection of data, the word thesaurus associated with said attribute has a respective entry containing said word; and
storing data representing data graph identifier lists respectively associated with the word thesaurus entries, wherein the data graph identifier list associated with a thesaurus entry relating to a word assigned to an attribute includes any identifier allocated to a data graph having said word assigned to said attribute. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52)
for any target table record which is not designated in the link column of the corresponding source table, inserting a dummy record in an additional row of said source table, having a link value designating said target table row in the link column and a default value in each attribute column, such insertion of dummy records being repeated until any record of any target table is designated in the link column of the corresponding source table;
initializing the data representing data graph identifier lists, to represent empty lists; and
processing the records of at least one root table which is not the target table for any link, the processing of each root table record comprising;
constructing a respective data graph consisting of related attribute values of said root table record and of corresponding records of the other data tables successively determined on the basis of the contents of the link columns in said records;
allocating an available identifier to said data graph; and
for each attribute, selecting the entry of the associated word thesaurus relating to the value assigned to said attribute in said data graph and updating the data representing the data graph identifier list associated with the selected entry to add the identifier allocated to said data graph to said list.
-
-
9. A method according to claim 8, wherein the processing of each root table record further comprises:
generating a row in a link table having a plurality of columns respectively associated with the data tables, wherein said row of the link table contains, in each column associated with a data table, either a value indicating that the data graph has been constructed from a dummy record with respect to said data table or a link value designating a row of said data table where the attribute values represented in the data graph are stored, said link table being used for retrieving from the data tables words of the collection of data represented in the data graphs.
-
10. A method according to claim 9, further comprising the step of deleting the dummy records from the data tables.
-
11. A method according to claim 9, further comprising the step of deleting the link columns from the data tables.
-
12. A method according to claim 1, further comprising the step of storing at least one macroword thesaurus associated with an attribute of said group and with a prefix length shorter than a word length of said attribute, having a respective entry for each prefix value having said prefix length and matching a corresponding prefix of at least one word assigned at least once to said attribute in the collection of data, said entry of the macroword thesaurus being associated with a data graph identifier list including each identifier allocated to a data graph having, assigned to said attribute, a word whose corresponding prefix matches said prefix value.
-
13. A method according to claim 12, wherein at least one attribute has a plurality of stored macroword thesauruses, associated with different prefix lengths.
-
14. A method according to claim 12, wherein the entries of each macroword thesaurus associated with an attribute are sorted based on the prefix values, and the entries of the word thesaurus associated with said attribute are sorted based on the words assigned to said attribute.
-
15. A method according to claim 14, wherein at least one attribute has Q stored macroword thesauruses associated with respective prefix lengths shorter than a word length of said attribute, Q being a positive integer, each of said Q macroword thesauruses having a thesaurus level parameter q such that 1≦
- q≦
Q, the prefix length being a decreasing function of the level parameter q, the word thesaurus associated with said attribute having a level parameter q=0, wherein each macroword thesaurus having a level parameter q≧
1 contains, in each entry provided for a level q prefix value, data designating the entry of the level q−
1 thesaurus which corresponds to the lowest or highest level q−
1 macroword or word whose corresponding prefix matches said level q prefix value.
- q≦
-
16. A method according to claim 1, wherein at least one thesaurus entry comprises data for pointing to at least one memory region where coding data representing the data graph identifier list associated with said entry are stored.
-
17. A method according to claim 16, wherein said memory region comprises a file individually allocated to said thesaurus entry.
-
18. A method according to claim 17, wherein said data for pointing to at least one memory region comprise the attribute value for which said thesaurus entry is provided, said attribute value being part of a respective file name used for accessing each file allocated to said thesaurus entry.
-
19. A method according to claim 16, wherein said memory region comprises a portion of a data container having a plurality of records each having a next address field, said portion being defined as a record chain in said data container, the chains being defined by means of the next address fields, and wherein said data for pointing to at least one memory region comprise a respective address of a first record of a chain in said data container.
-
20. A method according to claim 19, further comprising the step of grouping the records stored in the data container so that the records of each chain have contiguous addresses.
-
21. A method according to claim 19, wherein said data container is individually allocated to a thesaurus.
-
22. A method according to claim 19, wherein said data container is shared by a plurality of thesauruses.
-
23. A method according to claim 19, wherein the thesaurus associated with an attribute of said group has an index register where the thesaurus entries are sorted based on the words assigned to said attribute, each entry including a word index for pointing to a row of an auxiliary table, and wherein each row of the auxiliary table contains an address in the data container of a first record of a chain.
-
24. A method according to claim 19, wherein said data for pointing to at least one memory region comprise a respective address of a last record of a chain in said data container.
-
25. A method according to claim 16, wherein said coding data representing a data graph identifier list comprises integers respectively equal to the identifiers of said list.
-
26. A method according to claim 1, wherein a plurality of formats are provided for representing the data graph identifier lists, and wherein each thesaurus entry contains an indication of the format used for representing the data graph identifier list associated therewith.
-
27. A method according to claim 1, wherein said data representing data graph identifier lists comprise, for at least one thesaurus entry, coding data obtained by a coding scheme having n successive coding layers, n being a number at least equal to 1, each layer having a predetermined patter for dividing a range covering integers of an input list of said layer into subsets, said identifier list being the input list of the first layer for said thesaurus entry, wherein for any layer other than the last layer, an integer list representing the position, in the pattern of said layer, of each subset containing at least one integer of the input list forms the input list for the next layer, and wherein said coding data comprise, for each layer and each subset containing at least one integer of the input list, data representing the position of each integer of the input list within said subset and, at least if said layer is the last layer, data representing the position of said subset in the pattern of said layer.
-
28. A method according to claim 27, wherein the coding data are stored in a plurality of files including files respectively allocated to thesaurus entries.
-
29. A method according to claim 27, wherein the coding data are stored in a plurality of files including at least one file allocated to a respective thesaurus, for containing the coding data relating to the entries of said thesaurus.
-
30. A method according to claim 27, wherein the coding data are stored in at least one file allocated to a plurality of thesauruses, for containing the coding data relating to the entries of said plurality of thesauruses.
-
31. A method according to claim 27, wherein the pattern of each layer is such that the integer subsets are consecutive intervals consisting of the same number of integers.
-
32. A method according to claim 31, wherein said number of integers is a whole power of 2 for each layer.
-
33. A method according to claim 27, wherein the data representing the position of each integer of the input list of each layer within a subset consist of a bitmap segment in which each bit is associated with a respective integer of the subset to indicate whether said integer belongs to the input list of said layer.
-
34. A method according to claim 27, wherein the position of each subset in the layer n pattern is represented by an integer rank which is included in the coding data, in association with the data representing the position of the integers of the input list of layer n within said subset.
-
35. A method according to claim 27, wherein the position of a subset in the pattern of each layer is represented by an integer rank which is included in the coding data for said layer, in association with the data representing the position of the integers of the input list of said layer within said subset.
-
36. A method according to claim 27, wherein n≧
- 2 and layer k data containers each having a plurality of records are provided in a computer memory for 1≦
k≦
n, each record of a layer k data container being associated with a layer k integer rank representing the position of a subset in the layer k pattern, and wherein each record of a layer k data container associated with a layer k rank representing the position of a subset in the layer k pattern has a first field for containing data for retrieving the position within said subset of any integer of a layer k input list relating to a data graph identifier list, whereby a combination of said layer k rank with any position retrievable from the data contained in said first field determines a layer k−
1 rank with which a respective record of the layer k−
1 data container is associated if k>
1, and an identifier of said data graph identifier list if k=1.
- 2 and layer k data containers each having a plurality of records are provided in a computer memory for 1≦
-
37. A method according to claim 36, wherein, for 1≦
- k≦
n, said data contained in the first field of a record of the layer k data container for retrieving the position of any integer of a layer k input list within a subset comprise a bitmap segment in which each bit is associated with a respective integer of said subset to indicate whether said integer belongs to said layer k input list.
- k≦
-
38. A method according to claim 37, wherein, for 1≦
- k≦
n, each record of the layer k data container associated with a layer k rank further has a second field for containing said layer k rank.
- k≦
-
39. A method according to claim 38, wherein each data container comprises at least two files where the first and second fields of the records of said data container are respectively stored, said files being accessible separately.
-
40. A method according to claim 36, wherein, for 1≦
- k<
n, each record of the layer k data container further has a second field for containing a number representing the position of an integer of a layer k+1 input list within a subset of the layer k+1 pattern,and wherein, for 1<
k≦
n, said data contained in the first field of a record of the layer k data container associated with a layer k rank for retrieving the position of any integer of a layer k input list within a subset of the layer k pattern comprise a pointer to at least one record of the layer k−
1 data container in which the second field contains a number representing the position of an integer of said layer k input list within said subset of the layer k pattern, whereby said record of the layer k−
1 data container is associated with the layer k−
1 rank determined by the combination of said layer k rank with the position represented by said number.
- k<
-
41. A method according to claim 40, wherein said data contained in the first field of a record of the layer 1 data container for retrieving the position of any integer of a data graph identifier list within a subset comprise a bitmap segment in which each bit is associated with a respective integer of said subset to indicate whether said integer represents a data graph identifier of said list.
-
42. A method according to claim 40, wherein each record of the layer n data container associated with a layer n rank further has a second field for containing said layer n rank.
-
43. A method according to claim 40, wherein each layer k data container for 1<
- k<
n comprises at least two files where the first and second fields of the records of said data container are respectively stored, said files being accessible separately.
- k<
-
44. A method according to claim 36, wherein, for 1≦
- k≦
n, each record of the layer k data container further has a next address field, whereby record chains are defined in the layer k data container by means of the next address fields, and wherein at least some of the thesaurus entries are respectively associated with record chains in the layer n data container, whereby the coding data relating to one of said entries for layer n are stored in or retrievable from the record chain associated therewith in the layer n data container.
- k≦
-
45. A method according to claim 44, wherein, for 1≦
- k<
n, said thesaurus entries are respectively associated with record chains in the layer k data container, whereby the coding data relating to one of said entries for layer k are stored in or retrievable from the record chain associated therewith in the layer k data container.
- k<
-
46. A method according to claim 44, wherein, for 1<
- k≦
n, each record of the layer k data container further has a head address field for pointing to an address of a first record of a respective chain in the layer k−
1 data container.
- k≦
-
47. A method according to claim 44, wherein each layer k data container for 1≦
- k≦
n comprises at least two files where the first fields and the next address fields of the records of said data container are respectively stored, said files being accessible separately.
- k≦
-
48. A method according to claim 1, wherein an integer range covering the identifiers allocated to the data graphs is partitioned into a plurality of predetermined portions, wherein at least some of the data representing identifier lists are distributed into a plurality of storage sections respectively associated with said portions, wherein a storage section associated with one of said portions contains data representing identifier sub-lists consisting of identifiers of said portion.
-
49. A method according to claim 48, wherein a respective storage unit is provided for each of said portions of the data graph identifier range, to receive the storage sections associated with said portion.
-
50. A method according to claim 49, wherein at least some of the word thesauruses have a plurality of sections respectively associated with said portions, wherein a section, associated with one of said portions, of a word thesaurus associated with an attribute has a respective entry for each word assigned to said attribute in a data graph to which an identifier of said portion is allocated, said entry containing data for retrieving an identifier sub-list from the storage section associated with said portion.
-
51. A method according to claim 49, further comprising the step of storing at least one macroword thesaurus associated with an attribute of said group and with a prefix length shorter than a word length of said attribute, having a respective entry for each prefix value having said prefix length and matching a corresponding prefix of at least one word assigned at least once to said attribute in the collection of data, said entry of the macroword thesaurus being associated with a data graph identifier list including each identifier allocated to a data graph having, assigned to said attribute, a word whose corresponding prefix matches said prefix value, wherein at least some of the macroword thesauruses have a plurality of sections respectively associated with said portions, wherein a section, associated with one of said portions, of a macroword thesaurus associated with an attribute and with a prefix length has a respective entry for each prefix value having said prefix length and matching a corresponding prefix of at least one word assigned to said attribute in a data graph to which an identifier of said portion is allocated, said entry containing data for retrieving an identifier sub-list from the storage section associated with said portion.
-
52. A method according to claim 48, wherein each thesaurus entry has a plurality of fields respectively associated with said portions, for containing data for retrieving respective identifier sub-lists from the storage sections.
-
53. A method of processing a query in a database system, wherein a group of attributes is defined and words of a collection of data are assigned to said attributes, the group of attributes being divided into a plurality of sub-groups respectively associated with a plurality of data tables having independent numbers of records, with links between respective records from the data tables, wherein identifiers are respectively allocated to data graphs, each data graph representing related attribute values respectively assigned to the attributes of said group, each attribute value of a data graph being either a default value or a word of said collection of data, wherein a plurality of thesauruses each associated with a respective attribute of said group and data representing data graph identifier lists respectively associated with entries of said thesauruses are stored, wherein each thesaurus associated with one attribute is defined with reference to a partition into subsets of a set of words which can be assigned to said one attribute and has a respective entry for each subset including at least one word assigned to said one attribute in the collection of data, the data graph identifier list associated with said thesaurus entry including any identifier allocated to a data graph having a word of said subset assigned to said one attribute, the method comprising the steps of:
-
analyzing query criteria to determine a combination involving thesaurus entries relevant to the query;
determining a matching data graph identifier list based on said combination and on the stored data representing the data graph identifier lists associated with said relevant thesaurus entries;
processing said matching data graph identifier list to output a response. - View Dependent Claims (54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119)
selecting at least one range of words defined for said attribute in the query criteria; and
mapping the words of the selected range which are assigned to said attribute in the collection of data with one or more subsets, the thesaurus entry for each of said one or more subset being retained as a relevant entry for the selected range, and wherein the step of determining the matching data graph identifier list comprises merging respective portions of the identifier lists represented by the data of the relevant thesaurus entries retained for said selected range.
-
-
55. A method according to claim 54, wherein the mapping is performed so as to retain a minimum number of relevant thesaurus entries for each selected range.
-
56. A method according to claim 54, wherein each thesaurus associated with an attribute is defined with reference to a partition such that each subset consists of one word or of consecutive words of the set of words which can be assigned to said attribute, the entries of said thesaurus being sorted based on the words assigned to said attribute, and wherein the step of analyzing the query criteria comprises at least one dichotomy search in at least one thesaurus for identifying relevant thesaurus entries.
-
57. A method according to claim 56, wherein the thesauruses comprise word thesauruses each associated with a respective attribute of the group, with reference to a partition into subsets each consisting of one word.
-
58. A method according to claim 57, wherein each word thesaurus associated with an attribute of the group to which the default value is assigned in at least one of the data graphs further has an entry for the default value, whereby one of said data graph identifier lists is associated with said thesaurus entry for the default value and includes any identifier allocated to a data graph having said default value assigned to said attribute.
-
59. A method according to claim 57, wherein the thesauruses further comprise at least one macroword thesaurus associated with an attribute of said group and with a prefix length shorter than a word length of said attribute, whereby said macroword thesaurus is defined with reference to a partition into subsets each consisting of words beginning by a common prefix having said prefix length.
-
60. A method according to claim 59, wherein at least one attribute of said group has a plurality of macroword thesauruses, associated with different prefix lengths.
-
61. A method according to claim 54, wherein the step of analyzing the query criteria comprises determining said combination involving relevant thesaurus entries as a tree having at least one leaf node, each leaf node corresponding to at least one relevant thesaurus entry retained for a respective attribute.
-
62. A method according to claim 61, wherein said tree has a plurality of nodes including said at least one leaf node and at least one operator node, each operator node representing a Boolean operator applied to at least one partial criterion represented by another node of said tree, one of the operator nodes being a root node representing all the query criteria.
-
63. A method according to claim 62, wherein the nodes of said tree further include at least one preset node for which a data graph identifier list has been determined prior to said step of analyzing the query criteria.
-
64. A method according to claim 63, wherein the data graph identifier list of said preset node is determined from at least one matching data graph identifier list obtained when processing a previous query.
-
65. A method according to claim 62, wherein the step of determining the matching data graph identifier list comprises obtaining a respective identifier list for each node of said tree, whereby the identifier list obtained for each leaf node corresponding to at least one relevant thesaurus entry is the merger of respective portions of the identifier lists associated with said at least one relevant thesaurus entry, and the identifier list obtained for each operator node representing a Boolean operator applied to at least one partial criterion is obtained by applying said Boolean operator to the identifier lists obtained for the node representing said at least one partial criterion, said matching data graph identifier list being determined as the identifier list obtained for the root node.
-
66. A method according to claim 65, wherein each of said obtained identifier lists is produced in the form of a bitmap vector consisting of bits assigned to respective data graphs to indicate whether the identifiers allocated to said data graphs belong to said obtained list.
-
67. A method according to claim 65, wherein a coding scheme comprising n successive coding layers is used to provide coding data representing the identifier list associated with a thesaurus entry, n being a number at least equal to 1, each layer having a predetermined pattern for dividing a range covering integers of an input list of said layer into subsets, said identifier list being the input list of the first layer for said thesaurus entry, wherein for any layer other than the last layer, an integer list representing the position, in the patter of said layer, of each subset containing at least one integer of the input list forms the input list for the next layer,
and wherein the coding data comprise, for each layer and each subset containing at least one integer of the input list, data representing the position of each integer of the input list within said subset and, at least if said layer is the last layer, data representing the position of said subset in the pattern of said layer. -
68. A method according to claims 67, wherein the patter of each layer is such that the integer subsets are consecutive intervals consisting of the same number of integers.
-
69. A method according to claim 68, wherein said number of integers is a whole power of 2 for each layer.
-
70. A method according to claim 67, wherein said data representing the position of an integer of an input list within a subset consist of a bitmap segment.
-
71. A method according to claim 67, wherein the step of determining the matching data graph identifier list comprises determining a layer n integer list for each node of said tree, whereby the layer n integer list determined for a leaf node consists of a layer n input list associated, in the coding scheme, with the merger of the identifier lists represented in the relevant thesaurus entries to which said leaf node corresponds, and whereby the layer n integer list obtained for each operator node representing a Boolean operator applied to at least one partial criterion is obtained by applying said Boolean operator to the layer n integer lists determined for the nodes representing said at least one partial criterion, and wherein a layer n result list is determined as the layer n integer list obtained for the root node.
-
72. A method according to claim 71, wherein the nodes of said tree further include at least one preset node for which a data graph identifier list has been determined prior to said step of analyzing the query criteria, said data graph identifier list being subjected to the coding scheme to provide a layer n input list which is determined as said layer n integer list for said preset node.
-
73. A method according to claim 71, wherein, in the coding scheme, the coding data representing the position of each integer of an input list within a subset for the coding layer n define a layer n bitmap segment in which each bit is associated with a respective integer of the subset to indicate whether said integer belongs to said input list, while the data representing the position of said subset in the layer n pattern comprise a layer n integer rank associated with said layer n bitmap segment, and wherein the step of determining a layer n integer list for a leaf node comprises:
-
initializing a layer n bitmap vector with logical zeroes;
obtaining the layer n ranks and associated bitmap segments from the coding data for each relevant thesaurus entry to which said leaf node corresponds; and
for each of said layer n ranks, superimposing the layer n bitmap segment associated therewith onto a segment of said layer n bitmap vector having a position determined by said layer n rank, the superimposition being performed according to a bitwise Boolean OR operation, said layer n list for the leaf node corresponding to the resulting layer n bitmap vector.
-
-
74. A method according to claim 71, wherein n>
- 1 and the step of determining the matching data graph identifier list further comprises, for k decreasing from n−
1 to 1, determining a layer k integer list for each node of said tree, whereby the layer k integer list determined for a leaf node consists of any integer of a layer k input list, associated in the coding scheme with the identifier list represented in a relevant thesaurus entry to which said leaf node corresponds, which belongs to a layer k subset whose position is represented in the layer k+1 result list, and whereby the layer k integer list obtained for each operator node representing a Boolean operator applied to at least one partial criterion is obtained by applying said Boolean operator to the layer k integer lists determined for the nodes representing said at least one partial criterion, wherein a layer k result list is determined as the layer k integer list obtained for the root node, and wherein said matching data graph identifier list corresponds to the determined layer 1 result list.
- 1 and the step of determining the matching data graph identifier list further comprises, for k decreasing from n−
-
75. A method according to claim 74, wherein the nodes of said tree further include at least one preset node for which a data graph identifier list has been determined prior to said step of analyzing the query criteria, said data graph identifier list being subjected to the coding scheme to provide a layer k input list which is determined as said layer k integer list for said preset node.
-
76. A method according to claim 74, wherein, in the coding scheme, the coding data representing the position of each integer of an input list within a subset for a coding layer k<
- n define a layer k bitmap segment in which each bit is associated with a respective integer of the subset to indicate whether said integer belongs to said input list, while the coding data further comprise a layer k integer rank associated with said layer k bitmap segment to represent the position of said subset in the layer k pattern, and wherein the step of determining a layer k integer list for a leaf node comprises;
initializing a layer k bitmap vector with logical zeroes;
obtaining the layer k ranks from the coding data for each relevant thesaurus entry to which said leaf node corresponds; and
selecting any obtained layer k rank belonging to the layer k+1 result list and superimposing the associated layer k bitmap segment onto a segment of said layer k bitmap vector having a position determined by the selected layer k rank, the superimposition being performed according to a bitwise Boolean OR operation, said layer k list for the leaf node corresponding to the resulting layer k bitmap vector.
- n define a layer k bitmap segment in which each bit is associated with a respective integer of the subset to indicate whether said integer belongs to said input list, while the coding data further comprise a layer k integer rank associated with said layer k bitmap segment to represent the position of said subset in the layer k pattern, and wherein the step of determining a layer k integer list for a leaf node comprises;
-
77. A method according to claim 76, wherein, for 1≦
- k<
n, the layer k ranks and the layer k bitmap segments associated therewith for at least one thesaurus entry are stored at corresponding addresses in distinct first and second files, and wherein the step of determining a layer k integer list for a leaf node comprises;providing a rank table in a RAM memory, having records associated with the addresses in said first and second files;
filling the rank table by writing any selected layer k rank into the rank table record associated with the address of the selected layer k rank in said first file; and
for any record of the filled rank table containing a layer k rank and associated with an address in the second file, reading the associated layer k bitmap segment at said address in the second file and superimposing the read layer k bitmap segment onto a segment of said layer k bitmap vector having a position determined by said layer k rank.
- k<
-
78. A method according to claim 74, wherein the step of determining the matching data graph identifier list further comprises, for any coding layer k such that 1<
- k≦
n, determining a layer k′
filtering list for k≦
k′
≦
n consisting of the layer k′
input list obtained by providing the layer k result list as an input list in layer k of the coding scheme,wherein, in the coding scheme, the coding data representing the position of each integer of an input list within a subset for a coding layer k<
n define a layer k bitmap segment in which each bit is associated with a respective integer of the subset to indicate whether said integer belongs to said input list, while a layer k integer rank associated with said layer k bitmap segment represents the position of said subset in the layer k pattern, and wherein the step of determining a layer k integer list for a leaf node for k<
n comprises;
/a/ initializing a layer k bitmap vector with logical zeroes;
/b/ selecting the layer n ranks obtained from the coding data for each relevant thesaurus entry to which said leaf node corresponds, and setting k′
=n;
/c/ for each selected layer k′
rank;
/c1/ if the selected layer k′
rank represents the position in the layer k′
pattern of a subset which includes at least one integer of the layer k′
filtering list, obtaining the layer k′
bitmap segment with which the selected layer k′
rank is associated;
/c2/ for any integer of the layer k′
filtering list whose position within said subset is represented in said layer k′
bitmap segment, selecting a respective layer k′
−
1 rank determined from the selected layer k′
rank and said position represented in said layer k′
bitmap segment;
/c3/ if k′
>
k+1, executing step /c/ with k′
decremented by one unit; and
/c4/ if k′
−
1 =k, obtaining any layer k bitmap segment with which a selected layer k′
−
1 rank is associated, and superimposing said layer k bitmap segment onto a segment of said layer k bitmap vector having a position determined by said selected layer k′
−
1 rank, the superimposition being performed according to a bitwise Boolean OR operation,said layer k list for the leaf node corresponding to the resulting layer k bitmap vector.
- k≦
-
79. A method according to claim 78, wherein, for 1≦
- k<
n, the layer k bitmap segments for at least one thesaurus entry are stored in at least one layer k file at addresses respectively corresponding to the layer k ranks associated therewith, and wherein, for 1≦
k<
n, the step of determining a layer k integer list for a leaf node comprises;providing a rank table in a RAM memory, having records associated with the addresses in said layer k file;
filling the rank table by writing any selected layer k rank into the rank table record associated with the address corresponding to the selected layer k rank; and
for any record of the filled rank table containing a layer k rank and associated with an address in said layer k file, reading the associated layer k bitmap segment at said address and superimposing the read layer k bitmap segment onto a segment of said layer k bitmap vector having a position determined by said layer k rank.
- k<
-
80. A method according to claim 53, wherein a link table is stored, having a plurality of rows respectively associated with the data graphs and a plurality of columns respectively associated with the attribute sub-groups, wherein each row of the link table contains, in each one of the columns, either a value indicating that each attribute value represented in the data graph associated with said row and assigned to an attribute of the sub-group associated with said one of the columns is a default value or a link value for retrieving at least one stored word of the collection of data represented in the data graph associated with said row and assigned to an attribute of the sub-group associated with said one of the columns, and wherein the step of processing the matching data graph identifier list comprises reading at least one value in any row of the link table associated with a data graph identified in the matching data graph identifier list.
-
81. A method according to claim 80, wherein said data tables are stored, wherein each link value contained in the column of the link table associated with an attribute sub-group comprises data for identifying a record of the data table associated with said sub-group, and wherein the step of processing the matching data graph identifier list further comprises reading at least part of any data table record identified by a link value read in a row of the link table.
-
82. A method according to claim 53, wherein the step of processing the matching data graph identifier list comprises, for at least one attribute specified in the query, selecting a thesaurus associated with said attribute and detecting entries of the selected thesaurus with which identifier lists having a non-empty intersection with the matching data graph identifier list are associated.
-
83. A method according to claim 82, wherein said attribute specified in the query has Q+1 stored thesauruses associated with different prefix lengths, Q being an integer at least equal to 0, each of said Q+1 thesauruses having a thesaurus level parameter q such that 0≦
- q≦
Q, whereby the prefix length is a decreasing function of the level parameter q and corresponds to a word length of said attribute for q=0, wherein each of said Q+1 thesauruses is defined with reference to a respective partition into subsets each consisting of words beginning by a common prefix having the prefix length associated with said thesaurus, the entries of said thesaurus being sorted based on the prefix values.
- q≦
-
84. A method according to claim 83, wherein, the selected thesaurus having a level parameter QA≧
- 0, the detection of entries in the selected thesaurus comprises the steps of;
/a/ providing respective level q target lists and respective level q thesaurus ranges covering consecutive entries of the level q thesaurus for QA≦
q≦
Q;
/b/ initializing the level Q target list with the matching data graph identifier list, initializing the level parameter q with the value Q, and selecting a first entry of the level Q thesaurus range;
/c/ determining an intersection list between the level q target list and the identifier list associated with the selected entry of the level q thesaurus range;
/d/ if the intersection list determined in the preceding step /c/ is empty, selecting another entry of the level q thesaurus range and repeating step /c/;
/e/ if q is greater than QA;
/e1/ setting the level q−
1 target list as equal to the intersection list determined in the preceding step /c/;
/e2/ setting the level q−
1 thesaurus range as consisting of the entries of the level q−
1 thesaurus relating to level q−
1 prefixes which begin with the level q prefix of the selected level q thesaurus entry, and selecting a first entry of the level q−
1 thesaurus range;
/e3/ decrementing q by one unit and returning to step /c/;
/f/ if q is equal to QA;
/f1/ including the selected level QA thesaurus entry in the detected entries;
/f2/ if the level Q target list is equal to the intersection list determined in the preceding step /c/, terminating the detection of entries in the selected thesaurus;
/f3/ removing the integers of the intersection list determined in the preceding step /c/ from any target list including at least one integer which is not in said intersection list;
/f4/ setting q as the smallest level parameter for which the target list includes at least one integer which is not in said intersection list;
/f5/ selecting another entry in the level q thesaurus range and returning to step /c/.
- 0, the detection of entries in the selected thesaurus comprises the steps of;
-
85. A method according to claim 84, wherein Q≧
- 1 and each thesaurus having a level parameter q≧
1 further contains, in each entry provided for a level q prefix value, data designating the entry of the level q−
1 thesaurus which corresponds to the lowest or highest level q−
1 prefix beginning with the level q prefix of said level q thesaurus entry, and wherein step /e2/ comprises selecting the level q−
1 thesaurus entry designated in the selected level q thesaurus entry.
- 1 and each thesaurus having a level parameter q≧
-
86. A method according to claim 84, wherein a coding scheme comprising n successive coding layers is used to provide coding data representing the identifier list associated with a level q thesaurus entry for 0≦
- q≦
Q, n being a number at least equal to 1, each layer having a predetermined pattern for dividing a range covering integers of an input list of said layer into subsets, said identifier list being the input list of the first layer for said thesaurus entry, wherein for any layer other than the last layer, an integer list representing the position, in the pattern of said layer, of each subset containing at least one integer of the input list forms the input list for the next layer,wherein the coding data comprise, for each layer and each subset containing at least one integer of the input list, data representing the position of each integer of the input list within said subset and, at least if said layer is the last layer, data representing the position of said subset in the pattern of said layer, and wherein each level q target list forms a layer 1 and level q filtering list and is submitted as a layer 1 input list in the coding scheme for QA≦
q≦
Q to provide respective layer k and level q filtering lists for 1<
k≦
n if n>
1, said layer k and level q filtering list provided from a level q target list being the layer k input list obtained from said level q target list in the coding scheme.
- q≦
-
87. A method according to claim 86, wherein the pattern of each layer is such that the integer subsets are consecutive intervals consisting of the same number of integers.
-
88. A method according to claim 87, wherein said number of integers is a whole power of 2 for each layer.
-
89. A method according to claim 86, wherein the step /c/ of determining the intersection list between a level q target list and an identifier list comprises, from k=n:
-
/c1/ computing a layer k intersection list between the layer k input list obtained from said identifier list in the coding scheme and the layer k and level q filtering list corresponding to said level q target list;
/c2/ if the computed layer k intersection list is empty, determining said intersection list between the level q target list and the identifier list as being empty;
/c3/ if k=1, determining said intersection list between the level q target list and the identifier list as the computed layer 1 intersection list; and
/c4/ if k>
1, decrementing k by one unit and repeating from step /c1/.
-
-
90. A method according to claim 89, wherein, in the coding scheme, the coding data representing the position of each integer of an input list within a subset for a coding layer k≦
- n define a layer k bitmap segment in which each bit is associated with a respective integer of the subset to indicate whether said integer belongs to said input list, while the data representing the position of said subset in the layer k pattern comprise a layer k integer rank associated with said layer k bitmap segment, and wherein the step /c1/ of computing a layer k intersection list between a layer k input list obtained from an identifier list in the coding scheme and a layer k and level q filtering list, represented by a first layer k bitmap vector, comprises;
initializing a second layer k bitmap vector with logical zeroes;
obtaining layer k ranks from the coding data representing said identifier list; and
selecting any obtained layer k rank which represents the position in the layer k pattern of a subset including at least one integer of said layer k and level q filtering list, obtaining the layer k bitmap segment with which the selected layer k rank is associated, and determining a segment of the second layer k bitmap vector having a position determined by the selected layer k rank by combining the obtained layer k bitmap segment with a segment of the first layer k bitmap vector having a position determined by the selected layer k rank according to a bitwise Boolean AND operation, said layer k intersection list corresponding to the resulting second layer k bitmap vector.
- n define a layer k bitmap segment in which each bit is associated with a respective integer of the subset to indicate whether said integer belongs to said input list, while the data representing the position of said subset in the layer k pattern comprise a layer k integer rank associated with said layer k bitmap segment, and wherein the step /c1/ of computing a layer k intersection list between a layer k input list obtained from an identifier list in the coding scheme and a layer k and level q filtering list, represented by a first layer k bitmap vector, comprises;
-
91. A method according to claim 90, wherein, for 1≦
- k<
n, the layer k ranks and the layer k bitmap segments associated therewith for at least one thesaurus entry are stored at corresponding addresses in distinct first and second files, and wherein the step /c1/ of computing a layer k intersection list between a layer k input list obtained from an identifier list in the coding scheme and a layer k and level q filtering list, represented by a first layer k bitmap vector, comprises;providing a rank table in a RAM memory, having records associated with the addresses in said first and second files;
filling the rank table by writing any selected layer k rank into the rank table record associated with the address of said selected layer k rank in said first file; and
for any record of the filled rank table containing a layer k rank and associated with an address in the second file, reading the associated layer k bitmap segment at said address in the second file and combining the read layer k bitmap segment with a segment of the first layer k bitmap vector having a position determined by said layer k rank according to a bitwise Boolean AND operation to determine a segment of the second layer k bitmap vector having a position determined by said layer k rank.
- k<
-
92. A method according to claim 90, further comprising determining a pre-filtering flag for each entry of a level q thesaurus, said pre-filtering flag having a first value when said entry is associated with a data graph identifier list represented by coding data which do not define any layer n rank representing the position in the layer n pattern of a subset which includes at least one integer of a layer n and level q filtering list, and wherein the step /c/ of determining the intersection list between a level q target list, corresponding to said layer n and level q filtering list, and an identifier list associated with an entry of the level q thesaurus comprises determining said intersection list as being empty if the pre-filtering flag determined for said entry has said first value.
-
93. A method according to claim 92, wherein, for any level q thesaurus entry associated with a data graph identifier list represented by coding data which define a layer n rank representing the position in the layer n pattern of a subset which includes at least one integer of the layer n and level q filtering list, the layer n bitmap segment associated with said layer n rank is obtained and said first value is allocated to the pre-filtering flag determined for said entry if the obtained layer n bitmap segment does not represent the position of any integer of said layer n and level q filtering list within said subset.
-
94. A method according to claim 86, wherein n>
- 1 and in the coding scheme, the coding data representing the position of each integer of an input list within a subset for a coding layer k≦
n define a layer k bitmap segment in which each bit is associated with a respective integer of the subset to indicate whether said integer belongs to said input list, while a layer k integer rank associated with said layer k bitmap segment represents the position of said subset in the layer k pattern, and wherein the step /c/ of determining the intersection list between a level q target list, corresponding to layer k and level q filtering lists represented by a respective first layer k bitmap vectors for 1≦
k≦
n, and an identifier list comprises;/c1/ initializing a second bitmap vector with logical zeroes;
/c2/ selecting layer n ranks obtained from the coding data representing said identifier list, and setting k=n;
/c3/ for each selected layer k rank;
/c31/ if the selected layer k rank represents the position in the layer k pattern of a subset which includes at least one integer of said layer k and level q filtering list, obtaining the layer k bitmap segment with which the selected layer k rank is associated;
/c32/ for any integer of the layer k and level q filtering list whose position within said subset is represented in said layer k bitmap segment, selecting a respective layer k−
1 rank determined from the selected layer k rank and said position represented in said layer k bitmap segment;
/c33/ if k>
2, executing step /c3/ with k decremented by one unit; and
/c34/ if k=2, obtaining any layer 1 bitmap segment with which a selected layer 1 rank is associated, and combining the obtained layer 1 bitmap segment with a segment of the first layer 1 bitmap vector having a position determined by said layer 1 rank according to a bitwise Boolean AND operation to determine a segment of the second bitmap vector having a position determined by said layer 1 rank, said intersection list corresponding to the resulting second bitmap vector.
- 1 and in the coding scheme, the coding data representing the position of each integer of an input list within a subset for a coding layer k≦
-
95. A method according to claim 94, wherein the layer 1 bitmap segments for at least one thesaurus entry are stored in at least one layer 1 file at addresses respectively corresponding to the layer 1 ranks associated therewith, and the step /c/ of determining an intersection list comprises:
-
providing a rank table in a RAM memory, having records associated with the addresses in said layer 1 file;
filling the rank table by writing any layer 1 rank selected in step /c32/ into the rank table record associated with the address corresponding to the selected layer 1 rank; and
for any record of the filled rank table containing a layer 1 rank and associated with an address in said layer 1 file, reading the associated layer 1 bitmap segment at said address and combining the read layer 1 bitmap segment with a segment of the first layer 1 bitmap vector having a position determined by said layer 1 rank according to a bitwise Boolean AND operation to determine a segment of the second layer 1 bitmap vector having a position determined by said layer 1 rank.
-
-
96. A method according to claim 94, further comprising determining a pre-filtering flag for each entry of a level q thesaurus, said pre-filtering flag having a first value when said entry is associated with a data graph identifier list represented by coding data which do not define any layer n rank representing the position in the layer n pattern of a subset which includes at least one integer of a layer n and level q filtering list, and wherein the step /c/ of determining the intersection list between a level q target list, corresponding to said layer n and level q filtering list, and an identifier list associated with an entry of the level q thesaurus comprises determining said intersection list as being empty if the pre-filtering flag determined for said entry has said first value.
-
97. A method according to claim 96, wherein, for any level q thesaurus entry associated with a data graph identifier list represented by coding data which define a layer n rank representing the position in the layer n pattern of a subset which includes at least one integer of the layer n and level q filtering list, the layer n bitmap segment associated with said layer n rank is obtained and said first value is allocated to the pre-filtering flag determined for said entry if the obtained layer n bitmap segment does not represent the position of any integer of said layer n and level q filtering list within said subset.
-
98. A method according to claim 82, wherein the step of processing the matching data graph identifier list further comprises writing output data associated with any detected entry of a selected thesaurus into an output table.
-
99. A method according to claim 98, wherein the output table includes a respective row corresponding to each identifier of the matching data graph identifier list, and wherein output data associated with a detected entry of a selected thesaurus are written into any row of the output table corresponding to a data graph identifier belonging to both the matching data graph identifier list and the identifier list associated with said detected thesaurus entry.
-
100. A method according to claim 99, wherein each data graph identifier has a respective row of the output table corresponding thereto, wherein the rows of the output table are initialized with a default value before writing the output data, and wherein the rows of the output table which do not contain the default value are read after writing the output data.
-
101. A method according to claim 99, wherein the output table is associated with an index file having a respective record for each data graph identifier, containing either a default value or a pointer designating a respective row of the output table corresponding to said data graph identifier, wherein the records of the index file are initialized with a default value before writing the output data, and wherein the step of writing output data associated with a detected entry of a first selected thesaurus comprises, for each data graph identifier belonging to both the matching data graph identifier list and the identifier list associated with said detected entry of the first selected thesaurus:
-
allocating an available row of the output table to correspond to said data graph identifier, writing output data into the allocated row; and
writing a pointer to the allocated row into the record of the index file provided for said data graph identifier.
-
-
102. A method according to claim 99, wherein the output table has a plurality of columns each associated with a respective attribute for which a thesaurus is selected, and wherein output data associated with a detected entry of a thesaurus selected for an attribute associated with a column of the output table are written into said column.
-
103. A method according to claim 101, wherein the output table has a plurality of columns each associated with a respective attribute for which a thesaurus is selected, wherein output data associated with a detected entry of a thesaurus selected for an attribute associated with a column of the output table are written into said column, and wherein the step of writing output data associated with a detected entry of at least one second selected thesaurus comprises, for each data graph identifier belonging to both the matching data graph identifier list and the identifier list associated with said detected entry of the second selected thesaurus:
-
reading the pointer contained in the record of the index file provided for said data graph identifier; and
writing output data into the row of the output table designated by said pointer.
-
-
104. A method according to claim 98, wherein the selected thesaurus being a word thesaurus defined with reference to a partition into subsets each consisting of one word, the output data associated with a detected entry comprise the word for which said detected entry is provided.
-
105. A method according to claim 98, wherein the selected thesaurus being a macroword thesaurus associated with a prefix length and defined with reference to a partition into subsets each consisting of words beginning by a common prefix having said prefix length, the output data associated with a detected entry comprise the prefix value for which said detected entry is provided.
-
106. A method according to claim 98, wherein the output data associated with a detected entry comprise an address of said detected entry in the selected thesaurus.
-
107. A method according to claim 98, wherein the output data associated with a detected entry of a selected thesaurus comprise a numerical value derived from said thesaurus entry.
-
108. A method according to claim 107, wherein, for a detected entry of at least one selected thesaurus, said numerical value is calculated by applying a mathematical function to a thesaurus value stored in said entry.
-
109. A method according to claim 107, wherein, for a detected entry of at least one selected thesaurus, said numerical value is calculated by applying a mathematical function to a plurality of values including a thesaurus value stored in said entry and at least one value already present in the output table.
-
110. A method according to claim 107, wherein the output table includes a respective row corresponding to each identifier of the matching data graph identifier list, and wherein a numerical value derived from a detected thesaurus entry is written into any row of the output table corresponding to a data graph identifier belonging to both the matching data graph identifier list and the identifier list associated with said detected thesaurus entry.
-
111. A method according to claim 110, wherein the numerical value, derived from a detected entry of a first selected thesaurus and written into any row of the output table corresponding to a data graph identifier belonging to both the matching data graph identifier list and the identifier list associated with said entry of the first selected thesaurus, is obtained from a thesaurus value stored in said entry,
and wherein the numerical value, derived from a detected entry of at least one second selected thesaurus and written into a row of the output table corresponding to a data graph identifier belonging to both the matching data graph identifier list and the identifier list associated with said entry of the second selected thesaurus, is calculated by applying a mathematical function to a plurality of values including a thesaurus value stored in said entry and at least one value already present in said row of the output table. -
112. A method according to claim 110, further comprising calculating an output value from a set of numerical values which have been respectively written into the rows of the output table.
-
113. A method according to claim 53, wherein an integer range covering the identifiers allocated to the data graphs is partitioned into a plurality of predetermined portions, wherein at least some of the data representing identifier lists are distributed into a plurality of storage sections respectively associated with said portions, wherein a storage section associated with one of said portions contains data representing identifier sub-lists consisting of identifiers of said portion,
and wherein the step of determining a matching data graph identifier list is executed separately for the different portions of the data graph identifier range, by means of the respective storage sections. -
114. A method according to claim 113, wherein the step of processing the matching data graph identifier is at least partially executed separately for the different portions of the data graph identifier range, by means of the respective storage sections.
-
115. A method according to claim 113, wherein the thesauruses have a plurality of sections respectively associated with said portions, wherein a section, associated with one of said portions, of a thesaurus associated with an attribute and defined with reference to a partition into subsets has a respective entry for each subset of said partition which includes at least one word assigned to said attribute in a data graph to which an identifier of said portion is allocated, said entry containing data representing an identifier sub-list including each identifier of said portion allocated to a data graph having a word of said subset assigned to said attribute, and wherein the step of analyzing the query criteria is at least partially executed separately for the different portions of the data graph identifier range, by means of the respective thesaurus sections.
-
116. A method according to claim 113, wherein the separate step executions are carried out in parallel by respective processors for the different portions of the data graph identifier range.
-
117. A method according to claim 116, wherein each thesaurus entry has a plurality of fields respectively associated with said portions, for containing data for retrieving respective identifier sub-lists from the storage sections, wherein the step of analyzing the query criteria is executed centrally for all the portions of the data graph identifier range, and wherein the relevant thesaurus entries used by a processor executing the step of determining a matching data graph identifier list by means of a storage section are designated by the data for retrieving identifier sub-lists from said storage section.
-
118. A method according to claim 117, wherein the step of analyzing the query criteria is executed by a query server connected to said processors through a communication network.
-
119. A method according to claim 116, wherein a list update server is connected, through the communication network, to a plurality of storage units respectively coupled to said processors, the list update server controlling the storage units to maintain the storage sections.
-
120. A database system, for managing information from a plurality of data tables having different numbers of records, wherein the group of attributes is divided into a plurality of sub-groups respectively associated with the data tables, with links between respective records from the data tables, each data table having a column for each attribute of the associated sub-group and rows for containing data table records comprising at least one word assigned to an attribute of the associated sub-group in a collection of data, wherein respective identifiers are allocated to data graphs, each data graph representing related attribute values respectively assigned to the attributes of said group, each attribute value of a data graph being either a default value or a word of said collection of data, the words of each data graph being from linked data table records, the database system comprising:
-
means for storing a plurality of thesauruses respectively associated with attributes of said group, wherein each thesaurus associated with an attribute is defined with reference to a partition into subsets of a set of words which can be assigned to said attribute and has a respective entry for each subset including at least one word assigned to said attribute in the collection of data; and
means for storing data representing data graph identifier lists respectively associated with the thesaurus entries, wherein the data graph identifier list associated with a entry, relating to a subset, of a thesaurus associated with an attribute includes any identifier allocated to a data graph having a word of said subset assigned to said attribute. - View Dependent Claims (121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161)
means for analyzing query criteria to determine a combination involving thesaurus entries relevant to a query;
means for determining a matching data graph identifier list based on said combination and on the stored data representing the data graph identifier lists associated with said relevant thesaurus entries;
output means for processing said matching data graph identifier list to output a response.
-
-
128. A database system according to claim 127, wherein the means for analyzing the query criteria comprise:
-
means for selecting at least one range of words defined for one attribute in the query criteria; and
mapping means for mapping the words of the selected range which are assigned to said attribute in the collection of data with one or more subsets, and for retaining the thesaurus entry for each of said one or more subset as a relevant entry for the selected range, and wherein the means for determining the matching data graph identifier list comprise means for merging respective portions of the identifier lists represented by the data of the relevant thesaurus entries retained for said selected range.
-
-
129. A database system according to claim 128, wherein the mapping means are arranged to retain a minimum number of relevant thesaurus entries or each selected range.
-
130. A database system according to claim 128, wherein each thesaurus associated with an attribute is defined with reference to a partition such that each subset consists of one word or of consecutive words of the set of words which can be assigned to said attribute, the entries of said thesaurus being sorted based on the words assigned to said attribute, and wherein the means for analyzing the query criteria comprise dichotomy search means in at least one thesaurus for identifying relevant thesaurus entries.
-
131. A database system according to claim 130, wherein the thesauruses comprise word thesauruses each associated with a respective attribute of the group, with reference to a partition into subsets each consisting of one word.
-
132. A database system according to claim 131, wherein each word thesaurus associated with an attribute of the group to which the default value is assigned in at least one of the data graphs further has an entry for the default value, whereby one of said data graph identifier lists is associated with said thesaurus entry for the default value and includes any identifier allocated to a data graph having said default value assigned to said attribute.
-
133. A database system according to claim 130, wherein the thesauruses further comprise at least one macroword thesaurus associated with an attribute of said group and with a prefix length shorter than a word length of said attribute, whereby said macroword thesaurus is defined with reference to a partition into subsets each consisting of words beginning by a common prefix having said prefix length.
-
134. A database system according to claim 128, wherein the means for analyzing the query criteria comprise means for determining said combination involving relevant thesaurus entries as a tree having at least one leaf node, each leaf node corresponding to at least one relevant thesaurus entry retained for a respective attribute.
-
135. A database system according to claim 134, wherein said tree has a plurality of nodes including said at least one leaf node and at least one operator node, each operator node representing a Boolean operator applied to at least one partial criterion represented by another node of said tree, one of the operator nodes being a root node representing all the query criteria.
-
136. A database system according to claim 135, wherein the means for determining the matching data graph identifier list comprise means for obtaining a respective identifier list for each node of said tree, whereby the identifier list obtained for each leaf node corresponding to at least one relevant thesaurus entry is the merger of respective portions of the identifier lists associated with said at least one relevant thesaurus entry, and the identifier list obtained for each operator node representing a Boolean operator applied to at least one partial criterion is obtained by applying said Boolean operator to the identifier lists obtained for the node representing said at least one partial criterion, said matching data graph identifier list being determined as the identifier list obtained for the root node.
-
137. A database system according to claim 136, wherein each of said obtained identifier lists is produced in the form of a bitmap vector consisting of bits assigned to respective data graphs to indicate whether the identifiers allocated to said data graphs belong to said obtained list.
-
138. A database system according to claim 136, wherein a coding scheme comprising n successive coding layers is used to provide coding data representing the identifier list associated with a thesaurus entry, n being a number at least equal to 1, each layer having a predetermined pattern for dividing a range covering integers of an input list of said layer into subsets, said identifier list being the input list of the first layer for said thesaurus entry, wherein for any layer other than the last layer, an integer list representing the position, in the pattern of said layer, of each subset containing at least one integer of the input list forms the input list for the next layer,
and wherein the coding data comprise, for each layer and each subset containing at least one integer of the input list, data representing the position of each integer of the input list within said subset and, at least if said layer is the last layer, data representing the position of said subset in the pattern of said layer. -
139. A database system according to claims 138, wherein the pattern of each layer is such that the integer subsets are consecutive intervals consisting of the same number of integers.
-
140. A database system according to claim 139, wherein said number of integers is a whole power of 2 for each layer.
-
141. A database system according to claim 138, wherein the means for determining the matching data graph identifier list comprise means for determining a layer n integer list for each node of said tree, whereby the layer n integer list determined for a leaf node consists of a layer n input list associated, in the coding scheme, with the merger of the identifier lists represented in the relevant thesaurus entries to which said leaf node corresponds, and whereby the layer n integer list obtained for each operator node representing a Boolean operator applied to at least one partial criterion is obtained by applying said Boolean operator to the layer n integer lists determined for the nodes representing said at least one partial criterion, and wherein a layer n result list is determined as the layer n integer list obtained for the root node.
-
142. A database system according to claim 141, wherein the nodes of said tree further include at least one preset node for which a data graph identifier list has been determined in advance, said data graph identifier list being subjected to the coding scheme to provide a layer n input list which is determined as said layer n integer list for said preset node.
-
143. A database system according to claim 141, wherein n>
- 1 and the means for determining the matching data graph identifier list further comprise, for k decreasing from n−
1 to 1, means for determining a layer k integer list for each node of said tree, whereby the layer k integer list determined for a leaf node consists of any integer of a layer k input list, associated in the coding scheme with the identifier list represented in a relevant thesaurus entry to which said leaf node corresponds, which belongs to a layer k subset whose position is represented in the layer k+1 result list, and whereby the layer k integer list obtained for each operator node representing a Boolean operator applied to at least one partial criterion is obtained by applying said Boolean operator to the layer k integer lists determined for the nodes representing said at least one partial criterion, and means for determining a layer k result list as the layer k integer list obtained for the root node, and wherein said matching data graph identifier list corresponds to the determined layer 1 result list.
- 1 and the means for determining the matching data graph identifier list further comprise, for k decreasing from n−
-
144. A database system according to claim 127, wherein the output means comprise means for selecting a thesaurus associated with an attribute specified in the query, and means for detecting entries of the selected thesaurus with which identifier lists having a non-empty intersection with the matching data graph identifier list are associated.
-
145. A database system according to claim 144, wherein the output means further comprise means for writing output data associated with any detected entry of a selected thesaurus into an output table.
-
146. A database system according to claim 145, wherein the output table includes a respective row corresponding to each identifier of the matching data graph identifier list, and wherein said means for writing output data are arranged to write output data associated with a detected entry of a selected thesaurus into any row of the output table corresponding to a data graph identifier belonging to both the matching data graph identifier list and the identifier list associated with said detected thesaurus entry.
-
147. A database system according to claim 146, wherein the output table has a plurality of columns each associated with a respective attribute for which a thesaurus is selected, and wherein said means for writing output data are arranged to write into a column of the output table output data associated with a detected entry of a thesaurus selected for an attribute associated with said column.
-
148. A database system according to claim 145, wherein the selected thesaurus being a word thesaurus defined with reference to a partition into subsets each consisting of one word, the output data associated with a detected entry comprise the word for which said detected entry is provided.
-
149. A database system according to claim 145, wherein the selected thesaurus being a macroword thesaurus associated with a prefix length and defined with reference to a partition into subsets each consisting of words beginning by a common prefix having said prefix length, the output data associated with a detected entry comprise the prefix value for which said detected entry is provided.
-
150. A database system according to claim 145, wherein the output data associated with a detected entry comprise an address of said detected entry in the selected thesaurus.
-
151. A database system according to claim 145, wherein the output data associated with a detected entry of a selected thesaurus comprise a numerical value derived from said thesaurus entry.
-
152. A database system according to claim 151, further comprising means for calculating said numerical value for a detected entry of at least one selected thesaurus by applying a mathematical function to a thesaurus value stored in said entry.
-
153. A database system according to claim 151, further comprising means for calculating said numerical value for a detected entry of at least one selected thesaurus by applying a mathematical function to a plurality of values including a thesaurus value stored in said entry and at least one value already present in the output table.
-
154. A database system according to claim 151, wherein the output table includes a respective row corresponding to each identifier of the matching data graph identifier list, and wherein said means for writing output data are arranged to write a numerical value derived from a detected thesaurus entry into any row of the output table corresponding to a data graph identifier belonging to both the matching data graph identifier list and the identifier list associated with said detected thesaurus entry.
-
155. A database system according to claim 154, wherein the numerical value, derived from a detected entry of a first selected thesaurus and written into any row of the output table corresponding to a data graph identifier belonging to both the matching data graph identifier list and the identifier list associated with said entry of the first selected thesaurus, is obtained from a thesaurus value stored in said entry,
and wherein the numerical value, derived from a detected entry of at least one second selected thesaurus and written into a row of the output table corresponding to a data graph identifier belonging to both the matching data graph identifier list and the identifier list associated with said entry of the second selected thesaurus, is calculated by applying a mathematical function to a plurality of values including a thesaurus value stored in said entry and at least one value already present in said row of the output table. -
156. A database system according to claim 154, further comprising means for calculating an output value from a set of numerical values which have been respectively written into the rows of the output table.
-
157. A database system according to claim 127, wherein the means for storing data representing identifier lists comprise a plurality of storage sections respectively associated with distinct portions of an integer range covering the identifiers allocated to the data graphs according to a predetermined partition, wherein each storage section associated with one of said portions contains data representing identifier sub-lists consisting of identifiers of said portion,
and wherein the means for determining the matching data graph identifier list comprise means for determining respective matching identifier sub-lists included in the different portions of the data graph identifier range, by means of the respective storage sections. -
158. A database system according to claim 157, comprising a processor to determine said matching identifier sub-lists sequentially.
-
159. A database system according to claim 157, comprising a plurality of processors respectively associated to the different portions of the data graph identifier range, to determine said matching identifier sub-lists in parallel.
-
160. A database system according to claim 159, wherein the means for analyzing query criteria comprise a query server connected to said processors through a communication network.
-
161. A database system according to claim 160, further comprising a list update server connected, through the communication network, to a plurality of storage units respectively coupled to said processors, the list update server controlling the storage units to maintain the storage sections.
-
162. A computer program product for managing information from a plurality of data tables having different numbers of records, wherein the group of attributes is divided into a plurality of sub-groups respectively associated with the data tables, with links between respective records from the data tables, each data table having a column for each attribute of the associated sub-group and rows for containing data table records comprising at least one word assigned to an attribute of the associated sub-group in a collection of data, wherein respective identifiers are allocated to data graphs, each data graph representing related attribute values respectively assigned to the attributes of said group, each attribute value of a data graph being either a default value or a word of said collection of data, the words of each data graph being from linked data table records, the computer program product comprising:
-
instructions for storing a plurality of thesauruses respectively associated with attributes of said group, wherein each thesaurus associated with an attribute is defined with reference to a partition into subsets of a set of words which can be assigned to said attribute and has a respective entry for each subset including at least one word assigned to said attribute in the collection of data; and
instructions for storing data representing data graph identifier lists respectively associated with the thesaurus entries, wherein the data graph identifier list associated with a entry, relating to a subset, of a thesaurus associated with an attribute includes any identifier allocated to a data graph having a word of said subset assigned to said attribute. - View Dependent Claims (163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201)
instructions for analyzing query criteria to determine a combination involving thesaurus entries relevant to the query;
instructions for determining a matching data graph identifier list based on said combination and on the stored data representing the data graph identifier lists associated with said relevant thesaurus entries;
instructions for outputting a response by processing said matching data graph identifier list.
-
-
170. A computer program product according to claim 169, wherein the instructions for analyzing the query criteria comprise:
-
instructions for selecting at least one range of words defined for one attribute in the query criteria; and
instructions for mapping the words of the selected range which are assigned to said attribute in the collection of data with one or more subsets, and for retaining the thesaurus entry for each of said one or more subset as a relevant entry for the selected range, and wherein the instructions for determining the matching data graph identifier list comprises instructions for merging respective portions of the identifier lists represented by the data of the relevant thesaurus entries retained for said selected range.
-
-
171. A computer program product according to claim 170, wherein the instructions for mapping the words of the selected range are arranged to retain a minimum number of relevant thesaurus entries for each selected range.
-
172. A computer program product according to claim 170, wherein each thesaurus associated with an attribute is defined with reference to a partition such that each subset consists of one word or of consecutive words of the set of words which can be assigned to said attribute, the entries of said thesaurus being sorted based on the words assigned to said attribute, and wherein the instructions for analyzing the query criteria comprise instructions for performing a dichotomy search in at least one thesaurus for identifying relevant thesaurus entries.
-
173. A computer program product according to claim 172, wherein the thesauruses comprise word thesauruses each associated with a respective attribute of the group, with reference to a partition into subsets each consisting of one word.
-
174. A computer program product according to claim 173, wherein each word thesaurus associated with an attribute of the group to which the default value is assigned in at least one of the data graphs further has an entry for the default value, whereby one of said data graph identifier lists is associated with said thesaurus entry for the default value and includes any identifier allocated to a data graph having said default value assigned to said attribute.
-
175. A computer program product according to claim 172, wherein the thesauruses further comprise at least one macroword thesaurus associated with an attribute of said group and with a prefix length shorter than a word length of said attribute, whereby said macroword thesaurus is defined with reference to a partition into subsets each consisting of words beginning by a common prefix having said prefix length.
-
176. A computer program product according to claim 170, wherein the instructions for analyzing the query criteria comprise instructions for determining said combination involving relevant thesaurus entries as a tree having at least one leaf node, each leaf node corresponding to at least one relevant thesaurus entry retained for a respective attribute.
-
177. A computer program product according to claim 176, wherein said tree has a plurality of nodes including said at least one leaf node and at least one operator node, each operator node representing a Boolean operator applied to at least one partial criterion represented by another node of said tree, one of the operator nodes being a root node representing all the query criteria.
-
178. A computer program product according to claim 177, wherein the instructions for determining the matching data graph identifier list comprises instructions for obtaining a respective identifier list for each node of said tree, whereby the identifier list obtained for each leaf node corresponding to at least one relevant thesaurus entry is the merger of respective portions of the identifier lists associated with said at least one relevant thesaurus entry, and the identifier list obtained for each operator node representing a Boolean operator applied to at least one partial criterion is obtained by applying said Boolean operator to the identifier lists obtained for the node representing said at least one partial criterion, said matching data graph identifier list being determined as the identifier list obtained for the root node.
-
179. A computer program product according to claim 178, wherein each of said obtained identifier lists is produced in the form of a bitmap vector consisting of bits assigned to respective data graphs to indicate whether the identifiers allocated to said data graphs belong to said obtained list.
-
180. A computer program product according to claim 178, wherein a coding scheme comprising n successive coding layers is used to provide coding data representing the identifier list associated with a thesaurus entry, n being a number at least equal to 1, each layer having a predetermined pattern for dividing a range covering integers of an input list of said layer into subsets, said identifier list being the input list of the first layer for said thesaurus entry, wherein for any layer other than the last layer, an integer list representing the position, in the pattern of said layer, of each subset containing at least one integer of the input list forms the input list for the next layer,
and wherein the coding data comprise, for each layer and each subset containing at least one integer of the input list, data representing the position of each integer of the input list within said subset and, at least if said layer is the last layer, data representing the position of said subset in the pattern of said layer. -
181. A computer program product according to claims 180, wherein the pattern of each layer is such that the integer subsets are consecutive intervals consisting of the same number of integers.
-
182. A computer program product according to claim 181, wherein said number of integers is a whole power of 2 for each layer.
-
183. A computer program product according to claim 180, wherein the instructions for determining the matching data graph identifier list comprise instructions for determining a layer n integer list for each node of said tree, whereby the layer n integer list determined for a leaf node consists of a layer n input list associated, in the coding scheme, with the merger of the identifier lists represented in the relevant thesaurus entries to which said leaf node corresponds, and whereby the layer n integer list obtained for each operator node representing a Boolean operator applied to at least one partial criterion is obtained by applying said Boolean operator to the layer n integer lists determined for the nodes representing said at least one partial criterion, and wherein a layer n result list is determined as the layer n integer list obtained for the root node.
-
184. A computer program product according to claim 183, wherein the nodes of said tree further include at least one preset node for which a data graph identifier list has been determined in advance, said data graph identifier list being subjected to the coding scheme to provide a layer n input list which is determined as said layer n integer list for said preset node.
-
185. A computer program product according to claim 183, wherein n>
- 1 and the instructions for determining the matching data graph identifier list further comprise, for k decreasing from n−
1 to 1, instructions for determining a layer k integer list for each node of said tree, whereby the layer k integer list determined for a leaf node consists of any integer of a layer k input list, associated in the coding scheme with the identifier list represented in a relevant thesaurus entry to which said leaf node corresponds, which belongs to a layer k subset whose position is represented in the layer k+1 result list, and whereby the layer k integer list obtained for each operator node representing a Boolean operator applied to at least one partial criterion is obtained by applying said Boolean operator to the layer k integer lists determined for the nodes representing said at least one partial criterion, and instructions for determining a layer k result list as the layer k integer list obtained for the root node, and wherein said matching data graph identifier list corresponds to the determined layer 1 result list.
- 1 and the instructions for determining the matching data graph identifier list further comprise, for k decreasing from n−
-
186. A computer program product according to claim 169, wherein said instructions for outputting a response comprise instructions for selecting a thesaurus associated with an attribute specified in the query, and instructions for detecting entries of the selected thesaurus with which identifier lists having a non-empty intersection with the matching data graph identifier list are associated.
-
187. A computer program product according to claim 186, wherein said instructions for outputting a response comprise further comprise instructions for writing output data associated with any detected entry of a selected thesaurus into an output table.
-
188. A computer program product according to claim 187, wherein the output table includes a respective row corresponding to each identifier of the matching data graph identifier list, and wherein said instructions for writing output data are arranged to control the writing of output data associated with a detected entry of a selected thesaurus into any row of the output table corresponding to a data graph identifier belonging to both the matching data graph identifier list and the identifier list associated with said detected thesaurus entry.
-
189. A computer program product according to claim 188, wherein the output table has a plurality of columns each associated with a respective attribute for which a thesaurus is selected, and wherein said instructions for writing output data are arranged to control the writing into a column of the output table of output data associated with a detected entry of a thesaurus selected for an attribute associated with said column.
-
190. A computer program product according to claim 187, wherein the selected thesaurus being a word thesaurus defined with reference to a partition into subsets each consisting of one word, the output data associated with a detected entry comprise the word for which said detected entry is provided.
-
191. A computer program product according to claim 187, wherein the selected thesaurus being a macroword thesaurus associated with a prefix length and defined with reference to a partition into subsets each consisting of words beginning by a common prefix having said prefix length, the output data associated with a detected entry comprise the prefix value for which said detected entry is provided.
-
192. A computer program product according to claim 187, wherein the output data associated with a detected entry comprise an address of said detected entry in the selected thesaurus.
-
193. A computer program product according to claim 187, wherein the output data associated with a detected entry of a selected thesaurus comprise a numerical value derived from said thesaurus entry.
-
194. A computer program product according to claim 193, further comprising instructions for calculating said numerical value for a detected entry of at least one selected thesaurus by applying a mathematical function to a thesaurus value stored in said entry.
-
195. A computer program product according to claim 193, further comprising instructions for calculating said numerical value for a detected entry of at least one selected thesaurus by applying a mathematical function to a plurality of values including a thesaurus value stored in said entry and at least one value already present in the output table.
-
196. A computer program product according to claim 193, wherein the output table includes a respective row corresponding to each identifier of the matching data graph identifier list, and wherein said instructions for writing output data are arranged to control the writing of a numerical value derived from a detected thesaurus entry into any row of the output table corresponding to a data graph identifier belonging to both the matching data graph identifier list and the identifier list associated with said detected thesaurus entry.
-
197. A computer program product according to claim 196, wherein the numerical value, derived from a detected entry of a first selected thesaurus and written into any row of the output table corresponding to a data graph identifier belonging to both the matching data graph identifier list and the identifier list associated with said entry of the first selected thesaurus, is obtained from a thesaurus value stored in said entry,
and wherein the numerical value, derived from a detected entry of at least one second selected thesaurus and written into a row of the output table corresponding to a data graph identifier belonging to both the matching data graph identifier list and the identifier list associated with said entry of the second selected thesaurus, is calculated by applying a mathematical function to a plurality of values including a thesaurus value stored in said entry and at least one value already present in said row of the output table. -
198. A computer program product according to claim 196, further comprising instructions for calculating an output value from a set of numerical values which have been respectively written into the rows of the output table.
-
199. A computer program product according to claim 169, wherein the instructions for storing data representing identifier lists are arranged to supervise a plurality of storage sections respectively associated with distinct portions of an integer range covering the identifiers allocated to the data graphs according to a predetermined partition, wherein each storage section associated with one of said portions contains data representing identifier sub-lists consisting of identifiers of said portion,
and wherein the instructions for determining the matching data graph identifier list comprise instructions for determining respective matching identifier sub-lists included in the different portions of the data graph identifier range, by means of the respective storage sections. -
200. A computer program product according to claim 199, wherein said matching identifier sub-lists are determined sequentially.
-
201. A computer program product according to claim 199, wherein said matching identifier sub-lists are determined in parallel.
Specification