Method and system for finding similar records in mixed free-text and structured data
First Claim
1. A method for determining whether records are similar in a database, the method comprising:
- (a) selecting two records in the database;
(b) accessing two corresponding fields of the selected records;
(c) determining a type of data in the accessed fields; and
(d) applying a match function to the accessed corresponding fields to generate a match score based on the type of data in the accessed fields, wherein, if the type of data in the fields is nominal, the match function applied is a Boolean match function, if the type of data in the fields is ordinal, the match function applied is an ordinal match function, and if the type of data in the fields is unstructured data, the match function applied is a vector-based match function.
0 Assignments
0 Petitions
Accused Products
Abstract
A technique for data mining where the available data contains both structured as well as unstructured (free-text) data. The present invention combines together the information available from different types of data to provide a single similarity score indicating the degree of similarity between records. Thus, a data evaluation application selects two records from a database and compares corresponding fields from the two records. The application determines whether to apply a nominal matching process, an ordinal matching process, or a vector-space matching process depending on the type of data in each pair of corresponding fields. The application sums the matching scores for all the fields in the records to compute the similarity score.
89 Citations
15 Claims
-
1. A method for determining whether records are similar in a database, the method comprising:
-
(a) selecting two records in the database;
(b) accessing two corresponding fields of the selected records;
(c) determining a type of data in the accessed fields; and
(d) applying a match function to the accessed corresponding fields to generate a match score based on the type of data in the accessed fields, wherein, if the type of data in the fields is nominal, the match function applied is a Boolean match function, if the type of data in the fields is ordinal, the match function applied is an ordinal match function, and if the type of data in the fields is unstructured data, the match function applied is a vector-based match function. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A data processing system comprising:
-
a database having a plurality of records;
a processor capable of accessing records in the database, the processor configured to;
(a) select two records in the database;
(b) access two corresponding fields of the selected records;
(c) determine a type of data in the accessed fields; and
(d) apply a match function to the accessed corresponding fields to generate a match score based on the type of data in the accessed fields, wherein, if the type of data in the fields is nominal, the match function applied is a Boolean match function, if the type of data in the fields is ordinal, the match function applied is an ordinal match function, and if the type of data in the fields is unstructured data, the match function applied is a vector-based match function. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15)
-
Specification