×

Method and system for finding similar records in mixed free-text and structured data

  • US 7,076,485 B2
  • Filed: 03/06/2002
  • Issued: 07/11/2006
  • Est. Priority Date: 03/07/2001
  • Status: Expired due to Term
First Claim
Patent Images

1. A method for determining whether records are similar in a database containing both structured and unstructured, free-text data, the method comprising the steps of:

  • accessing two of the records from the database for evaluation;

    evaluating a match between the two records as a weighted match between each of a plurality of available fields, such that a matching process is selected as appropriate from among a group of matching processes including strict Boolean, ordinal, and vector-based matching processes, wherein;

    when a strict Boolean matching process is selected, applying a match function as an exact match test,when an ordinal matching process is selected, applying a match function that makes use of information concerning the size and ordering of the data domain, andwhen a vector-based matching process is selected applying a match function that uses a vector space frequency test; and

    calculating a similarity score between the two records, as follows;


    sim(recordi, recordj)=w1*match(a1i,a1j)+w2*match(a2i,a2j)+. . . wn*match(ani,anj)wherein sim is a similarity function that determines the similarity score for the two records,recordi is a first record of the two records and is identified in the database by an iterator i,recordj is a second record of the two records and is identified in the database by an iterator j,iterator n identifies a field position for a given field ani in the recordi and a corresponding field position for a given field anj in the recordj,match indicates the match function, anda symbol wn indicates a predefined weight for each result of each match function.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×