×

COLUMN WEIGHT CALCULATION FOR DATA DEDUPLICATION

  • US 20170351717A1
  • Filed: 06/02/2016
  • Published: 12/07/2017
  • Est. Priority Date: 06/02/2016
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for identifying potentially duplicative records in a data set, the method comprising:

  • collecting a data profile on the data set, where the data set comprises a plurality of data records organized by distinct attributes, such that separate elements of each record belong to a distinct attribute of the data set, and where the data profile provides descriptive information regarding the attributes, including at least a data classification for one or more of the attributes;

    determining a weight to be associated with a particular attribute based, at least in part, on the data profile;

    comparing at least one element of a first record of the data set to at least one element of a second record of the data set, where both the element of the first record and the element of the second record are associated with the particular attribute of the data set, to determine a degree of similarity between the two elements; and

    determining a likelihood that the first and second records are duplicative over each other, based, at least in part, on the degree of similarity between the two compared elements, where the effect that the degree of similarity between the two compared elements has on the overall determination as to the likelihood that the first and second records are duplicative is increased or decreased based on the determined weight associated with the particular attribute.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×