Methods and systems for database organization

US 8,266,147 B2
Filed: 11/26/2008
Issued: 09/11/2012
Est. Priority Date: 09/18/2006
Status: Active Grant

First Claim

Patent Images

1. A method of organizing data in a data processing system, data in the data processing system including a plurality of individual data elements arranged in at least one table having columns and rows, each of the data elements corresponding to a row and column, said method comprising:

grouping a plurality of the rows of the at least one table into a row unit, wherein the at least one table comprises a plurality of row units, and wherein a data unit corresponds to the row unit and a column, wherein each data unit comprises a plurality of data elements;

gathering information about each data unit and storing the information in a corresponding information unit, and using the information in the information units to minimize the number of data unit access requests during resolving the data queries received by the system; and

using a processing arrangement to perform a clustering operation on the rows to create the plurality of row units, each of the row units characterized by information units gathering information about corresponding data units, records to create a plurality of clusters, at least one of said plurality of clusters characterized by a selected metadata parameter, wherein said clustering operation optimizes a calculated value of a selected precision factor for the selected information unit, the selected information unit is selected to minimize the number of data unit access requests during resolving the data queries received by the system, and the selected precision factor is related to efficiency of using the selected information unit to minimize the number of data unit access requests during resolving the data queries received by the system metadata parameter.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A relational database having a plurality of records is organized by using a processing arrangement to perform a clustering operation on the records so as to create a number of clusters. At least one of the clusters is characterized by a selected metadata parameter. The clustering operation is performed to optimize a calculated value of a selected precision factor for the selected metadata parameter. The selected metadata parameter is selected to optimize execution of a database query and the value of the selected precision factor is related to efficiency of execution of the database query.

Citations

28 Claims

1. A method of organizing data in a data processing system, data in the data processing system including a plurality of individual data elements arranged in at least one table having columns and rows, each of the data elements corresponding to a row and column, said method comprising:
- grouping a plurality of the rows of the at least one table into a row unit, wherein the at least one table comprises a plurality of row units, and wherein a data unit corresponds to the row unit and a column, wherein each data unit comprises a plurality of data elements;
  
  gathering information about each data unit and storing the information in a corresponding information unit, and using the information in the information units to minimize the number of data unit access requests during resolving the data queries received by the system; and
  
  using a processing arrangement to perform a clustering operation on the rows to create the plurality of row units, each of the row units characterized by information units gathering information about corresponding data units, records to create a plurality of clusters, at least one of said plurality of clusters characterized by a selected metadata parameter, wherein said clustering operation optimizes a calculated value of a selected precision factor for the selected information unit, the selected information unit is selected to minimize the number of data unit access requests during resolving the data queries received by the system, and the selected precision factor is related to efficiency of using the selected information unit to minimize the number of data unit access requests during resolving the data queries received by the system metadata parameter.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, wherein the selected information unit is used to minimize the number of data unit access requests during resolving the data queries received by the system.
  - 3. The method of claim 1, wherein the selected information unit characterizes a corresponding data unit by identifying at least one of:
    - a minimum value of all data elements in the data unit;
      
      a maximum value of all data elements in the data unit;
      
      a number of non-null values found within the data elements in the data unit;
      
      a histogram mapping an occurrence of at least one value in the data unit;
      
      total value of the data elements in the data unit that provide information about occurrence of a character in an alphanumeric string;
      
      information about correlation of the data elements in the data unit with at least one data element of at least one other data unit; and
      
      common occurrence of at least one value in the data unit and in the at least one other data unit.
  - 4. The method of claim 1, wherein, the precision factor comprises a measure of effectiveness of the selected information unit in maximizing efficiency of using the selected information unit to minimize the number of data unit access requests during resolving the data queries received by the system.
  - 5. The method of claim 1, wherein the clustering operation comprises a first clustering operation and a second clustering operation, wherein said second clustering operation is performed only when said second clustering operation is predicted to improve the value of the selected precision factor by at least a selected threshold amount.
  - 6. The method of claim 1, further comprising repeating the method when a change occurs to at least one of the relational database and parameters related to the relational database.
  - 7. The method of claim 6, wherein the change comprises at least one of a change to an existing record, addition of a record, deletion of a record.
  - 8. The method of claim 6, wherein the change comprises a change in value of at least one of an information unit, a precision factor, a weighting factor, and a cluster quality parameter.
  - 9. The method of claim 8, wherein the change in value results from executing a database query.
  - 10. The method of claim 8 wherein the change in value is based on statistics of previously created information units.
  - 11. The method of claim 1, wherein performing a clustering operation comprises:
    - assigning a first record to a first cluster container, said first cluster container being one of a selected number of cluster containers;
      
      assigning each subsequent record to a selected one of (a) the first cluster container and (b) a different cluster container, said selection being made to maximize an average value of the selected precision factor.
  - 12. The method of claim 11, further comprising:
    - adjusting the selected number of cluster containers in response to at least one of (i) a speed of executing the method and (ii) the selected precision factor average value.
  - 13. The method of claim 11, wherein each cluster container has a selectable maximum size and further comprising adjusting, responsive to at least one calculated value of a precision factor of said cluster container, the selected maximum size of said data container.
  - 14. The method of claim 13, wherein, when a total number of records assigned to a cluster container corresponds to the selectable maximum size of the cluster container, a data cluster is formed from said records, and said cluster container is emptied and enabled to be assigned additional subsequent records.

15. A method of organizing a relational database having a plurality of records, said method comprising using a processing arrangement to perform a clustering operation on the records to create a plurality of clusters, at least one of said plurality of clusters characterized by a selected metadata parameter, wherein said clustering operation optimizes a calculated value of a selected precision factor for the selected metadata parameter, wherein the clustering operation comprises a first clustering operation and a second clustering operation, wherein said second clustering operation is performed only when said second clustering operation is predicted to improve the value of the selected precision factor by at least a selected threshold amount, and wherein the selected metadata parameter is a first selected metadata parameter and the at least one of said plurality of clusters is characterized by the first selected metadata parameter and a second selected metadata parameter, wherein the selected precision factor is a first selected precision factor for the first metadata parameter and a second selected precision factor for the second metadata parameter;
- the first selected metadata parameter has a first weighting factor applied to the first selected precision factor and the second selected metadata parameter has a second weighting factor applied to the second selected precision factor;
  
  a cluster quality parameter comprises a sum of the weighted first selected precision factor and the weighted second selected precision factor; and
  
  each of the first and second clustering operations improves the cluster quality parameter.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The method of claim 15, wherein each of the first weighting factor and the second weighting factor is at least one of (i) a default value, (ii) a manually set value, (iii) automatically recalculated during the regrouping step.
  - 17. The method of claim 15, wherein a value of at least one of the first weighting factor and the second weighting factor is selected based on an estimated correlation between said associated metadata parameter and query execution efficiency.
  - 18. The method of claim 15, wherein at least one of the first weighting factor and the second weighting factor is automatically recalculated by applying a weight derived from a previous iteration of the method.
  - 19. The method of claim 18, wherein:
    - at least one calculated value of at least one of the first selected precision factor and the second selected precision factor is recorded; and
      
      at least one of the first weighting factor and the second weighting factor is automatically recalculated while performing the second clustering operation using a statistical value derived from the corresponding at least one recorded first selected precision value and second selected precision value.
  - 20. The method of claim 15, wherein performing a clustering operation comprises:
    - assigning a first record to a first cluster container, said first cluster container being one of a selected number of cluster containers;
      
      assigning each subsequent record to a selected one of (a) the first cluster container and (b) a different cluster container, said selection being made to improve the cluster quality parameter.

21. A method of organizing a relational database having a plurality of records, said method comprising using a processing arrangement to perform a clustering operation on the records to create a plurality of clusters, at least one of said plurality of clusters characterized by a selected metadata parameter, wherein said clustering operation optimizes a calculated value of a selected precision factor for the selected metadata parameter, wherein performing a clustering operation comprises assigning a first record to a first cluster container, said first cluster container being one of a selected number of cluster containers;
- assigning each subsequent record to a selected one of (a) the first cluster container and (b) a different cluster container, said selection being made to maximize an average value of the selected precision factor; and
  
  adjusting the selected number of cluster containers in response to at least one of (i) a speed of executing the method and (ii) the selected precision factor average value, and wherein the quantity of cluster containers is adjusted at least one of downward when the speed is below a threshold, upward when the value of the selected precision factor is below a threshold, and based on a weighted function of the speed and the selected precision factor when both are below a threshold.

22. A method of organizing a relational database having a plurality of records, said method comprising using a processing arrangement to perform a clustering operation on the records to create a plurality of clusters, at least one of said plurality of clusters characterized by a selected metadata parameter, wherein said clustering operation optimizes a calculated value of a selected precision factor for the selected metadata parameter, wherein performing a clustering operation comprises assigning a first record to a first cluster container, said first cluster container being one of a selected number of cluster containers;
- assigning each subsequent record to a selected one of (a) the first cluster container and (b) a different cluster container, said selection being made to maximize an average value of the selected precision factor, and wherein said selection is made to improve a cluster quality parameter, said cluster quality parameter comprising a product of a weighting factor and the selected precision factor, said weighting factor corresponding to an average historical value of the selected precision factor.

23. A method of organizing a relational database having a plurality of records, said method comprising using a processing arrangement to perform a clustering operation on the records to create a plurality of clusters, at least one of said plurality of clusters characterized by a selected metadata parameter, wherein said clustering operation optimizes a calculated value of a selected precision factor for the selected metadata parameter, wherein performing a clustering operation comprises assigning a first record to a first cluster container, said first cluster container being one of a selected number of cluster containers;
- assigning each subsequent record to a selected one of (a) the first cluster container and (b) a different cluster container, said selection being made to maximize an average value of the selected precision factor, and wherein said selected number of cluster containers comprises a designated trash container and said step of assigning each subsequent record comprises assigning each subsequent record to a selected one of (a) the first cluster container, (b) the trash container, and (c) a different cluster container.
- View Dependent Claims (24, 25)
- - 24. The method of claim 23, wherein a record is assigned to the trash container when a result of assigning said record to any other container is found to decrease the average value of the selected precision factor more than a threshold amount.
  - 25. The method of claim 23, wherein, when a total number of records assigned to a cluster container remains unchanged during a portion of a clustering operation, said portion exceeding a specified number of records, said cluster container is emptied and said total number of records is moved to the trash container.

26. A method of organizing data in a data processing system, data in the data processing system including a plurality of individual data elements arranged in at least one table having columns and rows, each of the data elements corresponding to a row and column, said method comprising grouping a plurality of the rows of the at least one table into a row unit, wherein the at least one table comprises a plurality of row units, and wherein a data unit corresponds to the row unit and a column, wherein each data unit comprises a plurality of data elements;
- gathering information about each data unit and storing the information in a corresponding information unit, and using the information in the information units to minimize the number of data unit access requests during resolving the data queries received by the system; and
  
  (i) duplicating a plurality of records in said relational database to create at least a first and a second plurality of records, said first plurality of records being identical to said second plurality of records;
  
  (ii) performing a first clustering operation on each of the first plurality of records and second plurality of records to create a first plurality of clusters and a second plurality of clusters, each said cluster in the plurality of clusters characterized by at least one respective metadata parameter, each respective metadata parameter having an associated precision factor, each said associated precision factor having a respective weighting factor applied thereto, wherein said first clustering operation maximizes a calculated value of a weighted selected precision factor; and
  
  , responsive to a database query, (iii) executing said database query on a selected one of the first plurality of records and second plurality of records, said selection being made on the basis of a correlation between a property of the query and the calculated value of the selected precision factor for each of said first plurality of records and second plurality of records.

27. A method of executing a query of data in a data processing system, the data in the data processing system including a plurality of individual data elements arranged in at least one table having columns and rows, each of the data elements corresponding to a row and column, said method comprising grouping a plurality of the rows of the at least one table into a row unit, wherein the at least one table comprises a plurality of row units, and wherein a data unit corresponds to the row unit and a column, wherein each data unit comprises a plurality of data elements;
- gathering information about each data unit and storing the information in a corresponding information unit, and using the information in the information units to minimize the number of data unit access requests during resolving the data queries received by the system; and
  
  using a processing arrangement to perform a clustering operation on the rows to create the plurality of row units, each of the row units characterized by information units gathering information about corresponding data units records wherein said clustering operation optimizes a calculated value of a selected precision factor for the selected information unit, the selected information unit is selected to minimize the number of data unit access requests during resolving the data queries received by the system, and the selected precision factor is related to efficiency of using the selected information unit to minimize the number of data unit access requests during resolving the data queries received by the system metadata;
  
  using the selected information unit to minimize the number of data unit access requests during resolving the data queries received by the system at and returning a response to the query.

28. A computer-readable medium having computer readable instructions stored thereon for execution by a processor to perform a method of organizing data in a data processing system, data in the data processing system including a plurality of individual data elements arranged in at least one table having columns and rows, each of the data elements corresponding to a row and column, said computer-readable medium including computer readable instructions directed to grouping a plurality of the rows of the at least one table into a row unit, wherein the at least one table comprises a plurality of row units, and wherein a data unit corresponds to the row unit and a column, wherein each data unit comprises a plurality of data elements;
- computer readable instructions directed to gathering information about each data unit and storing the information in a corresponding information unit, and using the information in the information units to minimize the number of data unit access requests during resolving the data queries received by the system; and
  
  computer readable instructions directed to performing a clustering operation on the rows to create the plurality of row units, each of the row units characterized by information units gathering information about corresponding data units records to create a plurality of clusters, at least one of said plurality of clusters characterized by a selected metadata parameter, wherein said clustering operation optimizes a calculated value of a selected precision factor for the selected information unit, the selected information unit is selected to minimize the number of data unit access requests during resolving the data queries received by the system, and the selected precision factor is related to efficiency of using the selected information unit to minimize the number of data unit access requests during resolving the data queries received by the system metadata.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Deep SEAS LLC
Original Assignee
Infobright, Inc.
Inventors
Eastwood, Victoria, Wroblewski, Jakub, Slezak, Dominik, Kowalski, Marcin
Primary Examiner(s)
Pulliam, Christyann
Assistant Examiner(s)
Chojnacki, Mellissa M

Application Number

US12/324,630
Publication Number

US 20090106210A1
Time in Patent Office

1,385 Days
Field of Search

707/737, 707/769
US Class Current

707/737
CPC Class Codes

G06F 16/24 Querying

G06F 16/285 Clustering or classification

Methods and systems for database organization

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and systems for database organization

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links