Robust system for interactively learning a string similarity measurement
First Claim
Patent Images
1. A system for learning a string similarity measurement, said system comprising:
- a set of record clusters, each record in each cluster having a list of fields and data contained in each said field;
a set of initial weights for determining edit-distance measurements;
an initial field similarity function for assigning a field similarity score to each pair of field values in each cluster;
said set of initial weights and said field similarity function being modified by user feedback to produce an optimal set of edit-distance weights and an optimal field similarity function.
1 Assignment
0 Petitions
Accused Products
Abstract
A system learns a string similarity measurement. The system includes a set of record clusters. Each record in each cluster has a list of fields and data contained in each field. The system further includes a set of initial weights for determining edit distance measurements and an initial field similarity function for assigning a field similarity score to each pair of field values in each cluster. The set of initial weights and the field similarity function are modified by user feedback to produce an optimal set of edit-distance weights and an optimal field similarity function.
-
Citations
18 Claims
-
1. A system for learning a string similarity measurement, said system comprising:
-
a set of record clusters, each record in each cluster having a list of fields and data contained in each said field;
a set of initial weights for determining edit-distance measurements;
an initial field similarity function for assigning a field similarity score to each pair of field values in each cluster;
said set of initial weights and said field similarity function being modified by user feedback to produce an optimal set of edit-distance weights and an optimal field similarity function. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for learning a string similarity measurement, said method comprising the steps of:
-
providing a set of record clusters, each record in each cluster having a list of fields and data contained in each field;
providing a set of initial weights for determining edit-distance measurements;
providing an initial field similarity function for assigning a field similarity score to each pair of field values in each cluster;
modifying the set of initial weights and the field similarity function by user feedback to produce an optimal set of edit-distance weights and an optimal field similarity function. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer program product for interactively learning a string similarity measurement, said product comprising:
-
an input set of record clusters, each record in each cluster having a list of fields and data contained in each field;
a set of initial weights for determining edit-distance measurements;
an initial field similarity function for assigning a field similarity score to each pair of field values in each cluster;
said set of initial weights and said field similarity function being modified by user feedback to produce an optimal set of edit-distance weights and an optimal field similarity function. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification