×

Method and system for finding similar records in mixed free-text and structured data

  • US 7,440,946 B2
  • Filed: 01/13/2006
  • Issued: 10/21/2008
  • Est. Priority Date: 03/07/2001
  • Status: Expired due to Term
First Claim
Patent Images

1. A method for determining whether records are similar in a database, the method comprising:

  • (a) selecting two records in the database;

    (b) accessing two corresponding fields of the selected records;

    (c) determining a type of data in the accessed fields;

    (d) applying a match function to the accessed corresponding fields to generate a match score based on the type of data in the accessed fields,wherein,if the type of data in the fields is nominal, the match function applied is a Boolean match function, if the type of data in the fields is ordinal, the match function applied is an ordinal match function, and if the type of data in the fields is unstructured data, the match function applied is a vector-based match function;

    (e) repeating steps b through d for one or more additional corresponding fields of the selected records to generate one or more additional match scores; and

    (f) generating a similarity score that indicates a degree of similarity between the two records from the match scores, wherein the similarity score is generated as follows;


    similarity_score=w1*match(a1i,a1j)+w2*match(a2i,a2j)+ . . . wn*match(ani,anj)wherein,similarity_score is the similarity score,i identifies a first selected record in the database,j identifies a second selected record in the database,n identifies a field position for a given field ani in the first selected record and a corresponding field position for a given field anj in the second selected record, match indicates the match function used to generate the match scores, and wn indicates a predefined weight for each match score.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×