×

System for analysing data relationships to support data query execution

  • US 10,691,651 B2
  • Filed: 09/14/2017
  • Issued: 06/23/2020
  • Est. Priority Date: 09/15/2016
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method of identifying relationships between data tables, each data table comprising a plurality of data records, the method comprising:

  • evaluating a plurality of candidate relationships, each candidate relationship defined between a first column associated with a first data table and a second column associated with a second data table, the evaluating comprising computing relationship metrics for each candidate relationship, wherein the relationship metrics for a candidate relationship provide a measure of a relationship between the first column and the second column, the computing comprising;

    computing a first metric indicating a degree of distinctness of values of at least one of the first and second columns;

    computing a second metric indicating a measure of overlap between values of the first column and values of the second column;

    the method further comprising identifying one or more relationships between data tables in dependence on the computed relationship metrics;

    wherein the method is performed in a plurality of processing stages including;

    a first processing stage, comprising generating a map table which maps values appearing in the data tables to column locations of those data values;

    a second processing stage, comprising computing numbers of distinct data values for respective columns and numbers of distinct intersecting values for respective column pairs, the second processing stage comprising processing a plurality of partitions of the map table in parallel; and

    a third processing stage, comprising computing the relationship metrics based on the output of the second processing stage.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×