×

Duplicate data elimination system

  • US 20040249789A1
  • Filed: 06/04/2003
  • Published: 12/09/2004
  • Est. Priority Date: 06/04/2003
  • Status: Active Grant
First Claim
Patent Images

1. A process for finding a similar data records from a set of data records comprising:

  • providing a number of data records from which one or more canonical data records are identified;

    determining a similarity score for data records based on the contents of the records;

    grouping together data records whose similarity score with respect to each other is greater than a threshold to form one or more groups of data records that form nodes of a graph wherein edges between nodes represent a similarity score between records of a group; and

    within each said group, identifying a canonical record based on the similarity of data records to each other within the group.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×