Estimation of clustering for access planning

US 7,024,422 B2
Filed: 07/31/2002
Issued: 04/04/2006
Est. Priority Date: 07/31/2002
Status: Active Grant

First Claim

Patent Images

1. A method for generating a clustering statistic for an attribute of a relation to be used in optimizing execution of a query directed to one or more attributes of said relation, comprising:

accessing records of said relation from a database in electronic storage;

determining clustered storage locations of records in said relation, said clustered storage locations being locations where said records would be found in the event that said records were clustered relative to said attribute;

computing a correlation between actual storage locations of records in said relation and said clustered storage locations of said records; and

generating said clustering statistic based upon said correlation;

utilizing said statistic in execution of a query and retrieval of said records from said electronic storage.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for computing clustering factor that is particularly suitable for use with existing indexes. The clustering factor is generated, by first determining clustered storage locations of records in a relation, i.e., locations where the records would be found if they were clustered relative to the attribute (e.g., locations for the records if they were ordered in storage in accordance with the attribute). Then, the actual storage locations of records are correlated to the clustered storage locations, and a clustering statistic is generated based upon the correlation.

14 Citations

View as Search Results

15 Claims

1. A method for generating a clustering statistic for an attribute of a relation to be used in optimizing execution of a query directed to one or more attributes of said relation, comprising:
- accessing records of said relation from a database in electronic storage;
  
  determining clustered storage locations of records in said relation, said clustered storage locations being locations where said records would be found in the event that said records were clustered relative to said attribute;
  
  computing a correlation between actual storage locations of records in said relation and said clustered storage locations of said records; and
  
  generating said clustering statistic based upon said correlation;
  
  utilizing said statistic in execution of a query and retrieval of said records from said electronic storage.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1 wherein said locations where said records would be found in the event that said records were clustered relative to said attribute, are locations where said records would be found in the event that said records were ordered in storage in accordance with said attribute.
  - 3. The method of claim 1 wherein computing a correlation comprises computing an entropy of said actual storage locations and said clustered storage locations.
  - 4. The method of claim 3 wherein computing a correlation comprises computing a joint entropy of said actual and clustered storage locations.
  - 5. The method of claim 4 wherein computing a correlation comprises subtracting said joint entropy of said actual and clustered storage locations from a sum of said entropy of said actual storage locations and said entropy of said clustered storage locations.
  - 6. The method of claim 1 wherein said clustered storage locations are computed using an index formed over an attribute for which the clustering statistic is being computed.

7. A computer system implementing a relational database system and generating a clustering statistic for an attribute of a relation of said relational database, to be used in optimizing execution of a query directed to one or more attributes of said relation, comprising:
- electronic storage for said relational database, including a relation having a plurality of tuples including values for a plurality of attributes; and
  
  computing circuitry performing query optimization and query execution upon said relational database, said query optimization including generating a clustering statistic for an attribute of said relation by determining clustered storage locations of records in said relation, said clustered storage locations being locations where said records would be found in the event that said records were clustered relative to said attribute, computing a correlation between actual storage locations of records in said relation and said clustered storage locations of said records, generating said clustering statistic based upon said correlation, and utilizing said statistic in execution of a query and retrieval of said records from said electronic storage.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The computer system of claim 7 wherein said locations where said records would be found in the event that said records were clustered relative to said attribute, are locations where said records would be found in the event that said records were ordered in storage in accordance with said attribute.
  - 9. The computer system of claim 7 wherein computing a correlation comprises computing an entropy of said actual storage locations and said clustered storage locations.
  - 10. The computer system of claim 9 wherein computing a correlation comprises computing a joint entropy of said actual and clustered storage locations.
  - 11. The computer system of claim 10 wherein computing a correlation comprises subtracting said joint entropy of said actual and clustered storage locations from a sum of said entropy of said actual storage locations and said entropy of said clustered storage locations.
  - 12. The computer system of claim 7 wherein said clustered storage locations are computed using an index formed over an attribute for which the clustering statistic is being computed.

13. A program product for implementing a relational database system and generating a clustering statistic for an attribute of a relation of said relational database, to be used in optimizing execution of a query directed to one or more attributes of said relation, comprising:
- a relational database, including a relation that is electronically stored and accessed and has a plurality of tuples including values for a plurality of attributes; and
  
  relational database software performing query optimization and query execution upon said relational database, said query optimization including generating a clustering statistic for an attribute of said relation by determining clustered storage locations of records in said relation, said clustered storage locations being locations where said records would be found in the event that said records were clustered relative to said attribute, computing a correlation between actual storage locations of records in said relation and said clustered storage locations of said records, and generating said clustering statistic based upon said correlation, and utilizing said statistic in execution of a query and retrieval of said records from electronic storage; and
  
  a signal bearing media holding said relational database and relational database software.
- View Dependent Claims (14, 15)
- - 14. The program product of claim 13 wherein the signal bearing media comprises transmission media.
  - 15. The program product of claim 13 wherein the signal bearing media comprises recordable media.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
X Corp. (f/k/a Twitter, Inc.) (X Holdings Corp.)
Original Assignee
International Business Machines Corporation
Inventors
Abdo, Abdo Esmail
Primary Examiner(s)
Rones, Charles
Assistant Examiner(s)
AL HASHEMI, SANA A

Application Number

US10/209,515
Publication Number

US 20040024746A1
Time in Patent Office

1,343 Days
Field of Search

707/1, 707/2, 707/3, 707/100, 707/101, 707/104.1, 707/204, 711/170
US Class Current

1/1
CPC Class Codes

G06F 16/284   Relational databases

Y10S 707/99942   Manipulating data structure...

Y10S 707/99943   Generating database or data...

Y10S 707/99945   Object-oriented database st...

Estimation of clustering for access planning

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

14 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Estimation of clustering for access planning

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

14 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links