Systems and methods for record linkage and paraphrase generation using surrogate learning
First Claim
Patent Images
1. A method of using a processor and a memory for classifying data associated with a feature space X to a set of classes y={0,1}, wherein features defining the feature space X are partitioned into X=X1×
- X2, a random feature vector xε
X is denoted correspondingly as x=(x1, x2), and feature x1 is a binary random variable, the method comprising;
estimating P(x1|x2) from a set of unlabeled data;
estimating P(x1=0|x2) from a set of labeled data;
determining whether to classify a portion of the data to y=0 or y=1 based on the estimated P(x1=0|x2); and
logically associating the portion of the data in the memory with the class y=0 or the class y=1 based on the determination.
5 Assignments
0 Petitions
Accused Products
Abstract
A method of using unlabeled data to train a classifier is disclosed. In one embodiment related to record linkage, the method entails retrieving a set of candidate data records from a master database based on a least one update record. Next, a surrogate learning technique is used to identify one of the candidate data records as a match for the one update record. Lastly, the exemplary method links or merges the update record and the identified one of the candidate data records.
1 Citation
10 Claims
-
1. A method of using a processor and a memory for classifying data associated with a feature space X to a set of classes y={0,1}, wherein features defining the feature space X are partitioned into X=X1×
- X2, a random feature vector xε
X is denoted correspondingly as x=(x1, x2), and feature x1 is a binary random variable, the method comprising;estimating P(x1|x2) from a set of unlabeled data; estimating P(x1=0|x2) from a set of labeled data; determining whether to classify a portion of the data to y=0 or y=1 based on the estimated P(x1=0|x2); and logically associating the portion of the data in the memory with the class y=0 or the class y=1 based on the determination. - View Dependent Claims (2, 3, 4, 5)
- X2, a random feature vector xε
-
6. A system having a processor and a memory for classifying data associated with a feature space X to a set of classes y={0,1}, wherein features defining the feature space X are partitioned into X=X1×
- X2, a random feature vector xε
X is denoted correspondingly as x=(x1, x2), and feature x1 is a binary random variable, the system further comprising;means for estimating P(x1|x2) from a set of unlabeled data; means for estimating P(x1=0|x2) from a set of labeled data; means for determining whether to classify a portion of the data to y=0 or y=1 based on the estimated P(x1=0\x2); and means, responsive to the determination, for logically associating the portion of the data in the memory with the class y=0 or the class y=1. - View Dependent Claims (7, 8, 9)
- X2, a random feature vector xε
-
10. A method of using a processor and a memory for linking or merging update records with a master database of data records, the method comprising:
-
retrieving a set of candidate data records from the master database based on a least one update record; using surrogate learning to identify one of the candidate data records as a match for the one update record; and linking or merging the update record and the identified one of the candidate data records.
-
Specification