Methods and systems for database organization
First Claim
1. A method of organizing data in a data processing system, data in the data processing system including a plurality of individual data elements arranged in at least one table having columns and rows, each of the data elements corresponding to a row and column, said method comprising:
- grouping a plurality of the rows of the at least one table into a row unit, wherein the at least one table comprises a plurality of row units, and wherein a data unit corresponds to the row unit and a column, wherein each data unit comprises a plurality of data elements;
gathering information about each data unit and storing the information in a corresponding information unit, and using the information in the information units to minimize the number of data unit access requests during resolving the data queries received by the system; and
using a processing arrangement to perform a clustering operation on the rows to create the plurality of row units, each of the row units characterized by information units gathering information about corresponding data units, records to create a plurality of clusters, at least one of said plurality of clusters characterized by a selected metadata parameter, wherein said clustering operation optimizes a calculated value of a selected precision factor for the selected information unit, the selected information unit is selected to minimize the number of data unit access requests during resolving the data queries received by the system, and the selected precision factor is related to efficiency of using the selected information unit to minimize the number of data unit access requests during resolving the data queries received by the system metadata parameter.
5 Assignments
0 Petitions
Accused Products
Abstract
A relational database having a plurality of records is organized by using a processing arrangement to perform a clustering operation on the records so as to create a number of clusters. At least one of the clusters is characterized by a selected metadata parameter. The clustering operation is performed to optimize a calculated value of a selected precision factor for the selected metadata parameter. The selected metadata parameter is selected to optimize execution of a database query and the value of the selected precision factor is related to efficiency of execution of the database query.
-
Citations
28 Claims
-
1. A method of organizing data in a data processing system, data in the data processing system including a plurality of individual data elements arranged in at least one table having columns and rows, each of the data elements corresponding to a row and column, said method comprising:
grouping a plurality of the rows of the at least one table into a row unit, wherein the at least one table comprises a plurality of row units, and wherein a data unit corresponds to the row unit and a column, wherein each data unit comprises a plurality of data elements;
gathering information about each data unit and storing the information in a corresponding information unit, and using the information in the information units to minimize the number of data unit access requests during resolving the data queries received by the system; and
using a processing arrangement to perform a clustering operation on the rows to create the plurality of row units, each of the row units characterized by information units gathering information about corresponding data units, records to create a plurality of clusters, at least one of said plurality of clusters characterized by a selected metadata parameter, wherein said clustering operation optimizes a calculated value of a selected precision factor for the selected information unit, the selected information unit is selected to minimize the number of data unit access requests during resolving the data queries received by the system, and the selected precision factor is related to efficiency of using the selected information unit to minimize the number of data unit access requests during resolving the data queries received by the system metadata parameter.- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
15. A method of organizing a relational database having a plurality of records, said method comprising using a processing arrangement to perform a clustering operation on the records to create a plurality of clusters, at least one of said plurality of clusters characterized by a selected metadata parameter, wherein said clustering operation optimizes a calculated value of a selected precision factor for the selected metadata parameter, wherein the clustering operation comprises a first clustering operation and a second clustering operation, wherein said second clustering operation is performed only when said second clustering operation is predicted to improve the value of the selected precision factor by at least a selected threshold amount, and wherein the selected metadata parameter is a first selected metadata parameter and the at least one of said plurality of clusters is characterized by the first selected metadata parameter and a second selected metadata parameter, wherein the selected precision factor is a first selected precision factor for the first metadata parameter and a second selected precision factor for the second metadata parameter;
- the first selected metadata parameter has a first weighting factor applied to the first selected precision factor and the second selected metadata parameter has a second weighting factor applied to the second selected precision factor;
a cluster quality parameter comprises a sum of the weighted first selected precision factor and the weighted second selected precision factor; and
each of the first and second clustering operations improves the cluster quality parameter. - View Dependent Claims (16, 17, 18, 19, 20)
- the first selected metadata parameter has a first weighting factor applied to the first selected precision factor and the second selected metadata parameter has a second weighting factor applied to the second selected precision factor;
-
21. A method of organizing a relational database having a plurality of records, said method comprising using a processing arrangement to perform a clustering operation on the records to create a plurality of clusters, at least one of said plurality of clusters characterized by a selected metadata parameter, wherein said clustering operation optimizes a calculated value of a selected precision factor for the selected metadata parameter, wherein performing a clustering operation comprises assigning a first record to a first cluster container, said first cluster container being one of a selected number of cluster containers;
- assigning each subsequent record to a selected one of (a) the first cluster container and (b) a different cluster container, said selection being made to maximize an average value of the selected precision factor; and
adjusting the selected number of cluster containers in response to at least one of (i) a speed of executing the method and (ii) the selected precision factor average value, and wherein the quantity of cluster containers is adjusted at least one of downward when the speed is below a threshold, upward when the value of the selected precision factor is below a threshold, and based on a weighted function of the speed and the selected precision factor when both are below a threshold.
- assigning each subsequent record to a selected one of (a) the first cluster container and (b) a different cluster container, said selection being made to maximize an average value of the selected precision factor; and
-
22. A method of organizing a relational database having a plurality of records, said method comprising using a processing arrangement to perform a clustering operation on the records to create a plurality of clusters, at least one of said plurality of clusters characterized by a selected metadata parameter, wherein said clustering operation optimizes a calculated value of a selected precision factor for the selected metadata parameter, wherein performing a clustering operation comprises assigning a first record to a first cluster container, said first cluster container being one of a selected number of cluster containers;
- assigning each subsequent record to a selected one of (a) the first cluster container and (b) a different cluster container, said selection being made to maximize an average value of the selected precision factor, and wherein said selection is made to improve a cluster quality parameter, said cluster quality parameter comprising a product of a weighting factor and the selected precision factor, said weighting factor corresponding to an average historical value of the selected precision factor.
-
23. A method of organizing a relational database having a plurality of records, said method comprising using a processing arrangement to perform a clustering operation on the records to create a plurality of clusters, at least one of said plurality of clusters characterized by a selected metadata parameter, wherein said clustering operation optimizes a calculated value of a selected precision factor for the selected metadata parameter, wherein performing a clustering operation comprises assigning a first record to a first cluster container, said first cluster container being one of a selected number of cluster containers;
- assigning each subsequent record to a selected one of (a) the first cluster container and (b) a different cluster container, said selection being made to maximize an average value of the selected precision factor, and wherein said selected number of cluster containers comprises a designated trash container and said step of assigning each subsequent record comprises assigning each subsequent record to a selected one of (a) the first cluster container, (b) the trash container, and (c) a different cluster container.
- View Dependent Claims (24, 25)
-
26. A method of organizing data in a data processing system, data in the data processing system including a plurality of individual data elements arranged in at least one table having columns and rows, each of the data elements corresponding to a row and column, said method comprising grouping a plurality of the rows of the at least one table into a row unit, wherein the at least one table comprises a plurality of row units, and wherein a data unit corresponds to the row unit and a column, wherein each data unit comprises a plurality of data elements;
- gathering information about each data unit and storing the information in a corresponding information unit, and using the information in the information units to minimize the number of data unit access requests during resolving the data queries received by the system; and
(i) duplicating a plurality of records in said relational database to create at least a first and a second plurality of records, said first plurality of records being identical to said second plurality of records;
(ii) performing a first clustering operation on each of the first plurality of records and second plurality of records to create a first plurality of clusters and a second plurality of clusters, each said cluster in the plurality of clusters characterized by at least one respective metadata parameter, each respective metadata parameter having an associated precision factor, each said associated precision factor having a respective weighting factor applied thereto, wherein said first clustering operation maximizes a calculated value of a weighted selected precision factor; and
, responsive to a database query, (iii) executing said database query on a selected one of the first plurality of records and second plurality of records, said selection being made on the basis of a correlation between a property of the query and the calculated value of the selected precision factor for each of said first plurality of records and second plurality of records.
- gathering information about each data unit and storing the information in a corresponding information unit, and using the information in the information units to minimize the number of data unit access requests during resolving the data queries received by the system; and
-
27. A method of executing a query of data in a data processing system, the data in the data processing system including a plurality of individual data elements arranged in at least one table having columns and rows, each of the data elements corresponding to a row and column, said method comprising grouping a plurality of the rows of the at least one table into a row unit, wherein the at least one table comprises a plurality of row units, and wherein a data unit corresponds to the row unit and a column, wherein each data unit comprises a plurality of data elements;
- gathering information about each data unit and storing the information in a corresponding information unit, and using the information in the information units to minimize the number of data unit access requests during resolving the data queries received by the system; and
using a processing arrangement to perform a clustering operation on the rows to create the plurality of row units, each of the row units characterized by information units gathering information about corresponding data units records wherein said clustering operation optimizes a calculated value of a selected precision factor for the selected information unit, the selected information unit is selected to minimize the number of data unit access requests during resolving the data queries received by the system, and the selected precision factor is related to efficiency of using the selected information unit to minimize the number of data unit access requests during resolving the data queries received by the system metadata;
using the selected information unit to minimize the number of data unit access requests during resolving the data queries received by the system at and returning a response to the query.
- gathering information about each data unit and storing the information in a corresponding information unit, and using the information in the information units to minimize the number of data unit access requests during resolving the data queries received by the system; and
-
28. A computer-readable medium having computer readable instructions stored thereon for execution by a processor to perform a method of organizing data in a data processing system, data in the data processing system including a plurality of individual data elements arranged in at least one table having columns and rows, each of the data elements corresponding to a row and column, said computer-readable medium including computer readable instructions directed to grouping a plurality of the rows of the at least one table into a row unit, wherein the at least one table comprises a plurality of row units, and wherein a data unit corresponds to the row unit and a column, wherein each data unit comprises a plurality of data elements;
- computer readable instructions directed to gathering information about each data unit and storing the information in a corresponding information unit, and using the information in the information units to minimize the number of data unit access requests during resolving the data queries received by the system; and
computer readable instructions directed to performing a clustering operation on the rows to create the plurality of row units, each of the row units characterized by information units gathering information about corresponding data units records to create a plurality of clusters, at least one of said plurality of clusters characterized by a selected metadata parameter, wherein said clustering operation optimizes a calculated value of a selected precision factor for the selected information unit, the selected information unit is selected to minimize the number of data unit access requests during resolving the data queries received by the system, and the selected precision factor is related to efficiency of using the selected information unit to minimize the number of data unit access requests during resolving the data queries received by the system metadata.
- computer readable instructions directed to gathering information about each data unit and storing the information in a corresponding information unit, and using the information in the information units to minimize the number of data unit access requests during resolving the data queries received by the system; and
Specification