Systems, methods, and software for entity relationship resolution
First Claim
1. A system comprising:
- one or more processors;
an entity resolution database (“
ERD”
) resolution engine adapted to retrieve,responsive to a first set of data in one or more data fields in a public record,a set of candidate named entity records from a master named entity database based on one of a set of two or more blocking queries,wherein each blocking query in the set of two or more blocking queries comprises a query for a last name and a first name, and a city name, all extracted from the public record, and a query for a last name and a first name, all from the public record;
the ERD resolution engine further adapted to automatically determine a permutation for each blocking query in the set of two or more blocking queries and an order of execution for the set of two or more blocking queries based on the first set of data;
the ERD resolution engine further adapted to calculate similarity scores for the first set of data in the one or more of the data fields in the public record and a second set of data in a set of data fields in the set of candidate named entity records by comparing the second set of data in the set of data fields in the set of candidate named entity records retrieved by the set of blocking queries with the first set of data in the one or more data fields in the public record; and
the ERD resolution engine further adapted to determine a confidence rating for one or more of the set of similarity scores between the public record and the candidate named entity record.
4 Assignments
0 Petitions
Accused Products
Abstract
To facilitate access to public records, the present inventors devised, among other things, an entity resolution system. The exemplary system includes master records database of 300 million entities, which is partitioned into multiple distinct portions. The exemplary system extracts entity information from input public records and constructs one or more blocking queries against specific portions of the master records database to identify one or more sets of candidate records. Feature vectors are defined for the candidate records and machine learning techniques, such as Support Vector Machine, are used to determine which of the candidate records from the master records database match the input public records. Candidate records that match are logically associated with public records, enabling ready access via direct or indirect queries.
10 Citations
20 Claims
-
1. A system comprising:
-
one or more processors; an entity resolution database (“
ERD”
) resolution engine adapted to retrieve,responsive to a first set of data in one or more data fields in a public record, a set of candidate named entity records from a master named entity database based on one of a set of two or more blocking queries, wherein each blocking query in the set of two or more blocking queries comprises a query for a last name and a first name, and a city name, all extracted from the public record, and a query for a last name and a first name, all from the public record; the ERD resolution engine further adapted to automatically determine a permutation for each blocking query in the set of two or more blocking queries and an order of execution for the set of two or more blocking queries based on the first set of data; the ERD resolution engine further adapted to calculate similarity scores for the first set of data in the one or more of the data fields in the public record and a second set of data in a set of data fields in the set of candidate named entity records by comparing the second set of data in the set of data fields in the set of candidate named entity records retrieved by the set of blocking queries with the first set of data in the one or more data fields in the public record; and the ERD resolution engine further adapted to determine a confidence rating for one or more of the set of similarity scores between the public record and the candidate named entity record. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method comprising:
-
retrieving a set of candidate named entity records from a master named entity database based on one of a set of two or more blocking queries, with each blocking query based on one or more data fields in a public record, and wherein each blocking query comprises a query for a last name and a first name, and a city name, all extracted from the public record, and a query for a last name and a first name, all from the public record, and wherein a permutation for each blocking query in the set of two or more blocking queries and an order of execution for the set of two or more blocking queries is automatically determined based on the one or more data fields in the public record; calculating similarity scores for one or more of the data fields in the public record and a set of data fields in the set of candidate named entity records by comparing the set of data fields in the set of candidate named entity records retrieved by the set of blocking queries with the one or more data fields in the public record; and determining a confidence rating for one or more of the set of similarity scores between the public record and the candidate named entity record. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. An entity resolution system comprising:
a computer based system comprising an input adapted to receive user-defined inputs, a processor adapted to process executable code and user-defined inputs and a memory adapted to store the executable code and user-defined inputs, the executable code comprising; a retrieval code set stored on the memory, when executed by the processor, being responsive to a first set of data in one or more data fields in a public record and adapted to retrieve a set of candidate named entity records from a master named entity database based on one of a set of two or more blocking queries, wherein each blocking query in the set of two or more blocking queries includes a query for a last name and a first name, and a city name, all extracted from the public record, and a query for a last name and a first name, all from the public record; the retrieval set of code further adapted to automatically determine a permutation for each blocking query in the set of two or more blocking queries and an order of execution for the set of two or more blocking queries based on the first set of data; a matching code set stored on the memory and being adapted to, when executed by the processor, calculate similarity scores for the first set of data in the one or more of the data fields in the public record and a second set of data from a set of data fields in the set of candidate named entity records by comparing the second set of data from the set of data fields in the set of candidate named entity records retrieved by the set of blocking queries with the first set of data from the one or more data fields in the public record; and a confidence code set stored on the memory and being adapted to, when executed by the processor, determine a confidence rating for one or more of the set of similarity scores between the public record and the candidate named entity record. - View Dependent Claims (18, 19, 20)
Specification