×

Systems and methods for estimating functional relationships in a database

  • US 7,562,067 B2
  • Filed: 05/06/2005
  • Issued: 07/14/2009
  • Est. Priority Date: 05/06/2005
  • Status: Active Grant
First Claim
Patent Images

1. A system that facilitates estimating functional relationships associated with one or more columns in a database, the system, comprising at least a processor executing the following components:

  • a sampling component that receives a random sample of records within the database;

    an estimate generator component that calculates an estimate of strength of the functional relationships associated with the one or more columns based at least in part upon a subset of the received sample and a selected measure;

    an estimate selector component that facilitates selection of a measure of strength to be calculated by the estimate generator component;

    an overhead calculator component that estimates a measure of overhead associated with a column in the database by utilizing;

    Estimated



    Overhead



    ( A )
    = N S ^

    J A

    ( R )
    ,


    where S ^

    J A

    ( R )
    = 1 p 2 ·



    S k , 1

    A

    S k , 2


    ,


    p is a sampling fraction ( k N ) ,

    N is a number of rows that have a column A in a relation R, and Sk,1 and Sk,2 are two independent uniform random samples of size k drawn from the relation R;

    a row strength computation component that estimates strength |{circumflex over (X)}| of a column comprising one or more default values as a key column based at least on a number of clean records within the column in the database by utilizing;


    |{circumflex over (X)}|=|{circumflex over (X)}small|+|{circumflex over (X)}large| wherein, Xsmall is a set of “

    dirty”

    rows in a relation R that have either zero or one conflicting representative tuple pairs in a set of tuples S, and Xlarge corresponds to a set of “

    dirty”

    rows that have more than one conflicting pair represented in S;

    the estimate generator component calculates an estimate of strength of a column as a key column as a function of the received samples utilizing the overhead calculator component, or the row strength computation component based at least on the selection of a measure of strength.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×