Method of processing queries in a database system, and database system and software product for implementing such method
First Claim
1. A method of processing a query in a database system, wherein a plurality of row identifiers are defined to designate respective rows of a reference table having columns respectively associated with data attributes, said rows containing groups of related words assigned to said attributes in a collection of data, wherein a plurality of thesauruses each associated with a respective attribute and data representing reference table row identifier lists respectively associated with entries of said thesauruses are stored, wherein each thesaurus associated with one attribute is defined with reference to a partition into subsets of a set of words which can be assigned to said one attribute and has a respective entry for each subset including at least one word assigned to said one attribute in the collection of data, the reference table row identifier list associated with said thesaurus entry including any identifier allocated to a row of the reference table having a word of said subset assigned to said one attribute, the method comprising the steps of:
- determining a matching reference table row identifier list based on a combination of thesaurus entries relevant to the query and on the stored data representing the reference table row identifier lists associated with said relevant thesaurus entries; and
processing said matching row identifier list to output a response, wherein the step of processing the matching row identifier list comprises, for at least one attribute specified in the query, selecting a thesaurus associated with said attribute and detecting entries of the selected thesaurus with which identifier lists having a non-empty intersection with the matching row identifier list are associated.
3 Assignments
0 Petitions
Accused Products
Abstract
A reference table, which may not be stored, has columns associated with data attributes and rows containing related words assigned to those attributes in a collection of data. The stored data include thesauruses associated with the attributes, and reference table row identifier lists respectively associated with thesaurus entries. Each thesaurus is defined with reference to a partition into subsets of the words which can be assigned to the associated attribute, and has a respective entry for each subset including an assigned word, the row identifier list associated with this entry including any identifier allocated to a row of the reference table having a word of the subset assigned to the associated attribute. A matching reference table row identifier list is determined from the data representing the row identifier lists associated with thesaurus entries relevant to the query. To output a response, a thesaurus associated with at least one attribute is selected, and the entries of the selected thesaurus with which identifier lists having a non-empty intersection with the matching row identifier list are associated are detected.
-
Citations
55 Claims
-
1. A method of processing a query in a database system, wherein a plurality of row identifiers are defined to designate respective rows of a reference table having columns respectively associated with data attributes, said rows containing groups of related words assigned to said attributes in a collection of data, wherein a plurality of thesauruses each associated with a respective attribute and data representing reference table row identifier lists respectively associated with entries of said thesauruses are stored, wherein each thesaurus associated with one attribute is defined with reference to a partition into subsets of a set of words which can be assigned to said one attribute and has a respective entry for each subset including at least one word assigned to said one attribute in the collection of data, the reference table row identifier list associated with said thesaurus entry including any identifier allocated to a row of the reference table having a word of said subset assigned to said one attribute, the method comprising the steps of:
-
determining a matching reference table row identifier list based on a combination of thesaurus entries relevant to the query and on the stored data representing the reference table row identifier lists associated with said relevant thesaurus entries; and
processing said matching row identifier list to output a response, wherein the step of processing the matching row identifier list comprises, for at least one attribute specified in the query, selecting a thesaurus associated with said attribute and detecting entries of the selected thesaurus with which identifier lists having a non-empty intersection with the matching row identifier list are associated. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
/a/ providing respective level q target lists and respective level q thesaurus ranges covering consecutive entries of the level q thesaurus for QA≦
q≦
Q;
/b/ initializing the level Q target list with the matching row identifier list, initializing the level parameter q with the value Q, and selecting a first entry of the level Q thesaurus range;
/c/ determining an intersection list between the level q target list and the identifier list associated with the selected entry of the level q thesaurus range;
/d/ if the intersection list determined in the preceding step /c/ is empty, selecting another entry of the level q thesaurus range and repeating step /c/;
/e/ if q is greater than QA;
/e1/ setting the level q−
1 target list as equal to the intersection list determined in the preceding step /c/;
/e2/ setting the level q−
1 thesaurus range as consisting of the entries of the level q−
1 thesaurus relating to level q−
1 prefixes which begin with the level q prefix of the selected level q thesaurus entry, and selecting a first entry of the level q−
1 thesaurus range;
/e3/ decrementing q by one unit and returning to step /c/;
/f/ if q is equal to QA;
/f1/ including the selected level QA thesaurus entry in the detected entries;
/f2/ if the level Q target list is equal to the intersection list determined in the preceding step /c/, terminating the detection of entries in the selected thesaurus;
/f3/ removing the integers of the intersection list determined in the preceding step /c/ from any target list including at least one integer which is not in said intersection list;
/f4/ setting q as the smallest level parameter for which the target list includes at least one integer which is not in said intersection list;
/f5/ selecting another entry in the level q thesaurus range and returning to step /c/.
-
-
4. A method according to claim 3, wherein Q≧
- 1 and each thesaurus having a level parameter q≧
1 further contains, in each entry provided for a level q prefix value, data designating the entry of the level q−
1 thesaurus which corresponds to the lowest or highest level q−
1 prefix beginning with the level q prefix of said level q thesaurus entry, and wherein step /e2/ comprises selecting the level q−
1 thesaurus entry designated in the selected level q thesaurus entry.
- 1 and each thesaurus having a level parameter q≧
-
5. A method according to claim 3, wherein a coding scheme comprising n successive coding layers is used to provide coding data representing the identifier list associated with a level q thesaurus entry for 0≦
- q≦
Q, n being a number at least equal to 1, each layer having a predetermined pattern for dividing a range covering integers of an input list of said layer into subsets, said identifier list being the input list of the first layer for said thesaurus entry, wherein for any layer other than the last layer, an integer list representing the position, in the pattern of said layer, of each subset containing at least one integer of the input list forms the input list for the next layer,wherein the coding data comprise, for each layer and each subset containing at least one integer of the input list, data representing the position of each integer of the input list within said subset and, at least if said layer is the last layer, data representing the position of said subset in the pattern of said layer, and wherein each level q target list forms a layer 1 and level q filtering list and is submitted as a layer 1 input list in the coding scheme for QA≦
q≦
Q to provide respective layer k and level q filtering lists for 1<
k≦
n if n>
1, said layer k and level q filtering list provided from a level q target list being the layer k input list obtained from said level q target list in the coding scheme.
- q≦
-
6. A method according to claim 5, wherein the pattern of each layer is such that the integer subsets are consecutive intervals consisting of the same number of integers.
-
7. A method according to claim 6, wherein said number of integers is a whole power of 2 for each layer.
-
8. A method according to claim 5, wherein the step /c/ of determining the intersection list between a level q target list and an identifier list comprises, from k=n:
-
/c1/ computing a layer k intersection list between the layer k input list obtained from said identifier list in the coding scheme and the layer k and level q filtering list corresponding to said level q target list;
/c2/ if the computed layer k intersection list is empty, determining said intersection list between the level q target list and the identifier list as being empty;
/c3/ if k=1, determining said intersection list between the level q target list and the identifier list as the computed layer 1 intersection list; and
/c4/ if k>
1, decrementing k by one unit and repeating from step /c1/.
-
-
9. A method according to claim 8, wherein, in the coding scheme, the coding data representing the position of each integer of an input list within a subset for a coding layer k≦
- n define a layer k bitmap segment in which each bit is associated with a respective integer of the subset to indicate whether said integer belongs to said input list, while the data representing the position of said subset in the layer k pattern comprise a layer k integer rank associated with said layer k bitmap segment, and wherein the step /c1/ of computing a layer k intersection list between a layer k input list obtained from an identifier list in the coding scheme and a layer k and level q filtering list, represented by a first layer k bitmap vector, comprises;
initializing a second layer k bitmap vector with logical zeroes;
obtaining layer k ranks from the coding data representing said identifier list; and
selecting any obtained layer k rank which represents the position in the layer k pattern of a subset including at least one integer of said layer k and level q filtering list, obtaining the layer k bitmap segment with which the selected layer k rank is associated, and determining a segment of the second layer k bitmap vector having a position determined by the selected layer k rank by combining the obtained layer k bitmap segment with a segment of the first layer k bitmap vector having a position determined by the selected layer k rank according to a bitwise Boolean AND operation, said layer k intersection list corresponding to the resulting second layer k bitmap vector.
- n define a layer k bitmap segment in which each bit is associated with a respective integer of the subset to indicate whether said integer belongs to said input list, while the data representing the position of said subset in the layer k pattern comprise a layer k integer rank associated with said layer k bitmap segment, and wherein the step /c1/ of computing a layer k intersection list between a layer k input list obtained from an identifier list in the coding scheme and a layer k and level q filtering list, represented by a first layer k bitmap vector, comprises;
-
10. A method according to claim 9, wherein, for 1≦
- k<
n, the layer k ranks and the layer k bitmap segments associated therewith for at least one thesaurus entry are stored at corresponding addresses in distinct first and second files, and wherein the step /c1/ of computing a layer k intersection list between a layer k input list obtained from an identifier list in the coding scheme and a layer k and level q filtering list, represented by a first layer k bitmap vector, comprises;providing a rank table in a RAM memory, having records associated with the addresses in said first and second files;
filling the rank table by writing any selected layer k rank into the rank table record associated with the address of said selected layer k rank in said first file; and
for any record of the filled rank table containing a layer k rank and associated with an address in the second file, reading the associated layer k bitmap segment at said address in the second file and combining the read layer k bitmap segment with a segment of the first layer k bitmap vector having a position determined by said layer k rank according to a bitwise Boolean AND operation to determine a segment of the second layer k bitmap vector having a position determined by said layer k rank.
- k<
-
11. A method according to claim 9, further comprising determining a pre-filtering flag for each entry of a level q thesaurus, said pre-filtering flag having a first value when said entry is associated with a reference table row identifier list represented by coding data which do not define any layer n rank representing the position in the layer n pattern of a subset which includes at least one integer of a layer n and level q filtering list, and wherein the step /c/ of determining the intersection list between a level q target list, corresponding to said layer n and level q filtering list, and an identifier list associated with an entry of the level q thesaurus comprises determining said intersection list as being empty if the pre-filtering flag determined for said entry has said first value.
-
12. A method according to claim 11, wherein, for any level q thesaurus entry associated with a row identifier list represented by coding data which define a layer n rank representing the position in the layer n pattern of a subset which includes at least one integer of the layer n and level q filtering list, the layer n bitmap segment associated with said layer n rank is obtained and said first value is allocated to the pre-filtering flag determined for said entry if the obtained layer n bitmap segment does not represent the position of any integer of said layer n and level q filtering list within said subset.
-
13. A method according to claim 5, wherein n>
- 1 and in the coding scheme, the coding data representing the position of each integer of an input list within a subset for a coding layer k≦
n define a layer k bitmap segment in which each bit is associated with a respective integer of the subset to indicate whether said integer belongs to said input list, while a layer k integer rank associated with said layer k bitmap segment represents the position of said subset in the layer k pattern, and wherein the step /c/ of determining the intersection list between a level q target list, corresponding to layer k and level q filtering lists represented by a respective first layer k bitmap vectors for 1≦
k≦
n, and an identifier list comprises;/c1/ initializing a second bitmap vector with logical zeroes;
/c2/ selecting layer n ranks obtained from the coding data representing said identifier list, and setting k=n;
/c3/ for each selected layer k rank;
/c3/ if the selected layer k rank represents the position in the layer k pattern of a subset which includes at least one integer of said layer k and level q filtering list, obtaining the layer k bitmap segment with which the selected layer k rank is associated;
/c32/ for any integer of the layer k and level q filtering list whose position within said subset is represented in said layer k bitmap segment, selecting a respective layer k−
1 rank determined from the selected layer k rank and said position represented in said layer k bitmap segment;
/c33/ if k>
2, executing step /c3/ with k decremented by one unit; and
/c34/ if k=2, obtaining any layer 1 bitmap segment with which a selected layer 1 rank is associated, and combining the obtained layer 1 bitmap segment with a segment of the first layer 1 bitmap vector having a position determined by said layer 1 rank according to a bitwise Boolean AND operation to determine a segment of the second bitmap vector having a position determined by said layer 1 rank, said intersection list corresponding to the resulting second bitmap vector.
- 1 and in the coding scheme, the coding data representing the position of each integer of an input list within a subset for a coding layer k≦
-
14. A method according to claim 13, wherein the layer 1 bitmap segments for at least one thesaurus entry are stored in at least one layer 1 file at addresses respectively corresponding to the layer 1 ranks associated therewith, and the step /c/ of determining an intersection list comprises:
-
providing a rank table in a RAM memory, having records associated with the addresses in said layer 1 file;
filling the rank table by writing any layer 1 rank selected in step /c32/ into the rank table record associated with the address corresponding to the selected layer 1 rank; and
for any record of the filled rank table containing a layer 1 rank and associated with an address in said layer 1 file, reading the associated layer 1 bitmap segment at said address and combining the read layer 1 bitmap segment with a segment of the first layer 1 bitmap vector having a position determined by said layer 1 rank according to a bitwise Boolean AND operation to determine a segment of the second layer 1 bitmap vector having a position determined by said layer 1 rank.
-
-
15. A method according to claim 13, further comprising determining a pre-filtering flag for each entry of a level q thesaurus, said pre-filtering flag having a first value when said entry is associated with a reference table row identifier list represented by coding data which do not define any layer n rank representing the position in the layer n pattern of a subset which includes at least one integer of a layer n and level q filtering list, and wherein the step /c/ of determining the intersection list between a level q target list, corresponding to said layer n and level q filtering list, and an identifier list associated with an entry of the level q thesaurus comprises determining said intersection list as being empty if the pre-filtering flag determined for said entry has said first value.
-
16. A method according to claim 15, wherein, for any level q thesaurus entry associated with a row identifier list represented by coding data which define a layer n rank representing the position in the layer n pattern of a subset which includes at least one integer of the layer n and level q filtering list, the layer n bitmap segment associated with said layer n rank is obtained and said first value is allocated to the pre-filtering flag determined for said entry if the obtained layer n bitmap segment does not represent the position of any integer of said layer n and level q filtering list within said subset.
-
17. A method according to claim 1, wherein the step of processing the matching row identifier list further comprises writing output data associated with any detected entry of a selected thesaurus into an output table.
-
18. A method according to claim 17, wherein the output table includes a respective row corresponding to each identifier of the matching row identifier list, and wherein output data associated with a detected entry of a selected thesaurus are written into any row of the output table corresponding to a reference table row identifier belonging to both the matching row identifier list and the identifier list associated with said detected thesaurus entry.
-
19. A method according to claim 18, wherein each reference table row identifier has a respective row of the output table corresponding thereto, wherein the rows of the output table are initialized with a default value before writing the output data, and wherein the rows of the output table which do not contain the default value are read after writing the output data.
-
20. A method according to claim 18, wherein the output table is associated with an index file having a respective record for each reference table row identifier, containing either a default value or a pointer designating a respective row of the output table corresponding to said reference table row identifier, wherein the records of the index file are initialized with a default value before writing the output data, and wherein the step of writing output data associated with a detected entry of a first selected thesaurus comprises, for each reference table row identifier belonging to both the matching row identifier list and the identifier list associated with said detected entry of the first selected thesaurus:
-
allocating an available row of the output table to correspond to said reference table row identifier, writing output data into the allocated row; and
writing a pointer to the allocated row into the record of the index file provided for said reference table row identifier.
-
-
21. A method according to claim 18, wherein the output table has a plurality of columns each associated with a respective attribute for which a thesaurus is selected, and wherein output data associated with a detected entry of a thesaurus selected for an attribute associated with a column of the output table are written into said column.
-
22. A method according to claim 20, wherein the output table has a plurality of columns each associated with a respective attribute for which a thesaurus is selected, wherein output data associated with a detected entry of a thesaurus selected for an attribute associated with a column of the output table are written into said column, and wherein the step of writing output data associated with a detected entry of at least one second selected thesaurus comprises, for each reference table row identifier belonging to both the matching row identifier list and the identifier list associated with said detected entry of the second selected thesaurus:
-
reading the pointer contained in the record of the index file provided for said reference table row identifier; and
writing output data into the row of the output table designated by said pointer.
-
-
23. A method according to claim 17, wherein the selected thesaurus being a word thesaurus defined with reference to a partition into subsets each consisting of one word, the output data associated with a detected entry comprise the word for which said detected entry is provided.
-
24. A method according to claim 17, wherein the selected thesaurus being a macroword thesaurus associated with a prefix length and defined with reference to a partition into subsets each consisting of words beginning by a common prefix having said prefix length, the output data associated with a detected entry comprise the prefix value for which said detected entry is provided.
-
25. A method according to claim 17, wherein the output data associated with a detected entry comprise an address of said detected entry in the selected thesaurus.
-
26. A method according to claim 17, wherein the output data associated with a detected entry of a selected thesaurus comprise a numerical value derived from said thesaurus entry.
-
27. A method according to claim 26, wherein, for a detected entry of at least one selected thesaurus, said numerical value is calculated by applying a mathematical function to a thesaurus value stored in said entry.
-
28. A method according to claim 26, wherein, for a detected entry of at least one selected thesaurus, said numerical value is calculated by applying a mathematical function to a plurality of values including a thesaurus value stored in said entry and at least one value already present in the output table.
-
29. A method according to claim 26, wherein the output table includes a respective row corresponding to each identifier of the matching row identifier list, and wherein a numerical value derived from a detected thesaurus entry is written into any row of the output table corresponding to a reference table row identifier belonging to both the matching row identifier list and the identifier list associated with said detected thesaurus entry.
-
30. A method according to claim 29, wherein the numerical value, derived from a detected entry of a first selected thesaurus and written into any row of the output table corresponding to a reference table row identifier belonging to both the matching row identifier list and the identifier list associated with said entry of the first selected thesaurus, is obtained from a thesaurus value stored in said entry,
and wherein the numerical value, derived from a detected entry of at least one second selected thesaurus and written into a row of the output table corresponding to a reference table row identifier belonging to both the matching row identifier list and the identifier list associated with said entry of the second selected thesaurus, is calculated by applying a mathematical function to a plurality of values including a thesaurus value stored in said entry and at least one value already present in said row of the output table. -
31. A method according to claim 29, further comprising calculating an output value from a set of numerical values which have been respectively written into the rows of the output table.
-
32. A method according to claim 1, wherein an integer range covering the identifiers designating the rows of the reference table is partitioned into a plurality of predetermined portions, wherein at least some of the data representing identifier lists are distributed into a plurality of storage sections respectively associated with said portions, wherein a storage section associated with one of said portions contains data representing identifier sub-lists consisting of identifiers of said portion;
and wherein the step of determining a matching row identifier list is executed separately for the different portions of the reference table row identifier range, by means of the respective storage sections.
-
33. A method according to claim 32, wherein the step of processing the matching row identifier is at least partially executed separately for the different portions of the reference table row identifier range, by means of the respective storage sections.
-
34. A method according to claim 32, wherein the separate step executions are carried out in parallel by respective processors for the different portions of the reference table row identifier range.
-
35. A method according to claim 34, wherein the combination of thesaurus entries relevant to the query is determined, on the basis of criteria specified in the query, by a query server connected to said processors through a communication network.
-
36. A method according to claim 35, wherein a list update server is connected, through the communication network, to a plurality of storage units respectively coupled to said processors, the list update server controlling the storage units to maintain the storage sections.
-
37. A method according to claim 1, wherein said reference table is a virtual table which is not stored.
-
38. A database system for managing information from a collection of data, wherein a plurality of row identifiers are defined to designate respective rows of a reference table having columns respectively associated with data attributes, said rows containing groups of related words assigned to said attributes in the collection of data, the database system comprising:
-
means for storing a plurality of thesauruses respectively associated with attributes of said group, wherein each thesaurus associated with an attribute is defined with reference to a partition into subsets of a set of words which can be assigned to said attribute and has a respective entry for each subset including at least one word assigned to said attribute in the collection of data;
means for storing data representing identifier lists respectively associated with the thesaurus entries, wherein the identifier list associated with a entry, relating to a subset, of a thesaurus associated with an attribute includes any row identifier designating a row of the reference table having a word of said subset assigned to said attribute;
means for determining a matching reference table row identifier list based on a combination of thesaurus entries relevant to a query and on the stored data representing the reference table row identifier lists associated with said relevant thesaurus entries; and
means for processing said matching row identifier list to output a response, and wherein the means for processing the matching row identifier list comprise means for selecting a thesaurus associated with an attribute specified in the query and means for detecting entries of the selected thesaurus with which identifier lists having a non-empty intersection with the matching row identifier list are associated. - View Dependent Claims (39)
-
-
40. A computer program product for processing queries in a database system, wherein a plurality of row identifiers are defined to designate respective rows of a reference table having columns respectively associated with data attributes, said rows containing groups of related words assigned to said attributes in a collection of data, the database system comprising:
-
means for storing a plurality of thesauruses respectively associated with attributes of said group, wherein each thesaurus associated with an attribute is defined with reference to a partition into subsets of a set of words which can be assigned to said attribute and has a respective entry for each subset including at least one word assigned to said attribute in the collection of data; and
means for storing data representing identifier lists respectively associated with the thesaurus entries, wherein the identifier list associated with a entry, relating to a subset, of a thesaurus associated with an attribute includes any row identifier designating a row of the reference table having a word of said subset assigned to said attribute, the computer program product comprising; instructions for determining a matching reference table row identifier list based on a combination of thesaurus entries relevant to a query and on the stored data representing the reference table row identifier lists associated with said relevant thesaurus entries; and
instructions for processing said matching row identifier list to output a response, wherein the instructions for processing the matching row identifier list comprise instructions for selecting a thesaurus associated with an attribute specified in the query and instructions for detecting entries of the selected thesaurus with which identifier lists having a non-empty intersection with the matching row identifier list are associated. - View Dependent Claims (41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55)
and wherein the numerical value, derived from a detected entry of at least one second selected thesaurus and written into a row of the output table corresponding to a reference table row identifier belonging to both the matching reference table row identifier list and the identifier list associated with said entry of the second selected thesaurus, is calculated by applying a mathematical function to a plurality of values including a thesaurus value stored in said entry and at least one value already present in said row of the output table. -
52. A computer program product according to claim 50, further comprising instructions for calculating an output value from a set of numerical values which have been respectively written into the rows of the output table.
-
53. A computer program product according to claim 40, wherein the instructions for storing data representing identifier lists are arranged to supervise a plurality of storage sections respectively associated with distinct portions of an integer range covering the identifiers allocated to the reference table rows according to a predetermined partition, wherein each storage section associated with one of said portions contains data representing identifier sub-lists consisting of identifiers of said portion,
and wherein the instructions for determining the matching reference table row identifier list comprise instructions for determining respective matching identifier sub-lists included in the different portions of the reference table row identifier range, by means of the respective storage sections. -
54. A computer program product according to claim 53, wherein said matching identifier sub-lists are determined sequentially.
-
55. A computer program product according to claim 53, wherein said matching identifier sub-lists are determined in parallel.
-
Specification