×

Techniques for relationship discovery between datasets

  • US 10,650,000 B2
  • Filed: 09/14/2017
  • Issued: 05/12/2020
  • Est. Priority Date: 09/15/2016
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising, at a computer system:

  • generating first profile metadata for each column of a first plurality of columns in a first dataset stored a first data source;

    generating second profile metadata for each column of a second plurality of columns in a second dataset stored a second data source;

    identifying a plurality of column pairs between the first dataset and the second dataset, wherein each column pair in the plurality of column pairs includes a different one of the first plurality of columns and a different one of the second plurality of columns;

    determining one or more column pairs from the plurality of identified column pairs to exclude;

    excluding at least one column pair from the one or more determined column pairs;

    for each of the one or more column pairs remaining after the excluding step;

    based on a type of join specified via a graphical interface, computing a plurality of scores for the column pair, each of the plurality of scores computed based on a different one of a plurality of scoring functions, the score indicating a measure for joining columns in the column pair;

    computing a plurality of weighted scores, each of the plurality of weighted scores computed for a different one of the plurality of scores based on applying one of a plurality of weights to the different one of the plurality of scores; and

    determining a pair score for the column pair, the pair score being a summation of the plurality of weighted scores;

    based on the pair score for each of the one or more column pairs, selecting a first column pair from the one or more column pairs;

    generating a third dataset based on merging, according to the type of join, the first dataset at a first column within the first column pair with the second dataset at a second column in the first column pair; and

    generating the graphical interface to display the generated third dataset.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×