×

Robust system for interactively learning a record similarity measurement

  • US 20040181526A1
  • Filed: 03/11/2003
  • Published: 09/16/2004
  • Est. Priority Date: 03/11/2003
  • Status: Abandoned Application
First Claim
Patent Images

1. A system for learning a record similarity measurement, said system comprising:

  • a set of record clusters, each record in each cluster having a list of fields and data contained in each said field;

    a predetermined threshold score for two of said records in one of said clusters to be considered similar;

    at least one decision tree constructed from a predetermined portion of said set of clusters, said decision tree encoding rules for determining a field similarity score of a related set of said fields; and

    a set of record pairs that may be determined to be duplicate records, said set of record pairs each having a record similarity score determined by said field similarity scores, said record pairs having a record similarity score greater than or equal to said predetermined threshold score being determined to be duplicate records.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×