×

Method and apparatus for rapid identification of column heterogeneity

  • US 8,176,016 B1
  • Filed: 11/17/2006
  • Issued: 05/08/2012
  • Est. Priority Date: 11/17/2006
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented method for identifying data heterogeneity, the method comprising:

  • receiving data associated with a column in a database;

    computing a cluster entropy solely for the data of the column as a measure of data heterogeneity, wherein the cluster entropy is computed by;

    determining a plurality of soft clusters from the data;

    assigning a probability to each of the plurality of soft clusters equal to a fraction of data points of the data that each of the plurality of soft clusters contains; and

    computing the cluster entropy based on a resulting distribution of the plurality of soft clusters, wherein the entropy of the resulting distribution comprises the cluster entropy;

    determining, via a processor, whether the data of the column is heterogeneous in accordance with the cluster entropy; and

    providing a determination of whether the data of the column is heterogeneous as an output to a user.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×