System and method for indexing information about entities from different information sources
First Claim
1. A system for associating a data record from an information source into a database, the database containing a plurality of data records, the system comprising:
- means for receiving a data record from an information source, the received data record having a predetermined number of fields containing information about a particular entity;
means for comparing selected fields within the received data record with corresponding fields within the data records already in the database;
means, responsive to comparison, for identifying data records already in the database having data within some of the selected fields that match to the data in the fields of the received data record as possible matching candidates, the identifying means further comprising one or more control databases for identifying errors in the data contained in one or more fields of the received data record in order to correct the data in the received data record and means for matching the corrected data in the received data record with the data records already in the database; and
means for scoring the identified matching candidates using a predetermined scoring criteria which measures a likelihood of a match between the received data record and the data records in the database based on the selected fields to determine if the received data record and a data record in the database contains information about the same entity thereby associating data records about the same entity despite errors contained in the data records.
3 Assignments
0 Petitions
Accused Products
Abstract
A system and method for indexing a data record from an information source into a database, the database containing a plurality of data records, is provided comprising receiving a data record from an information source, the received data record having a predetermined number of fields containing information about a particular entity, standardizing and validating the data in the received data record. A system and method is also provided for retrieving records that refer to an entity characterized by a specific set of data values by comparing a predetermined number of fields within the received data record with a predetermined number of fields within the data records already in the database, selecting data records already in the database as candidates having data within some of the predetermined fields that is identical to the data in the fields of the received data record, and scoring the candidates to determine data records having information about the same entity.
253 Citations
29 Claims
-
1. A system for associating a data record from an information source into a database, the database containing a plurality of data records, the system comprising:
-
means for receiving a data record from an information source, the received data record having a predetermined number of fields containing information about a particular entity; means for comparing selected fields within the received data record with corresponding fields within the data records already in the database; means, responsive to comparison, for identifying data records already in the database having data within some of the selected fields that match to the data in the fields of the received data record as possible matching candidates, the identifying means further comprising one or more control databases for identifying errors in the data contained in one or more fields of the received data record in order to correct the data in the received data record and means for matching the corrected data in the received data record with the data records already in the database; and means for scoring the identified matching candidates using a predetermined scoring criteria which measures a likelihood of a match between the received data record and the data records in the database based on the selected fields to determine if the received data record and a data record in the database contains information about the same entity thereby associating data records about the same entity despite errors contained in the data records. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A system for associating data records from a plurality of sources containing information about the same entity together despite errors in the information contained in the data records, the system comprising:
-
means for comparing an incoming data record to a database of data records based on a comparison of selected fields in the incoming data record and in the data records in the database to identify matching data records based on the selected fields; and means for controlling the comparison means comprising one or more control databases for identifying errors in the data contained in one or more fields of the received data record in order to correct the data in the received data record and means for matching the corrected data in the received data record with the data records already in the database, the one or more control databases comprises a rules database for storing rules for automatically determining the associations between data records containing information about the same entity, a links database for storing said associations between the data records about a same entity in a separate database from the data record database, an exception database for storing an action to be taken when a received data record cannot be processed, an anonymous name database for storing known anonymous names which appear in the data records in the data records database, a canonical name database for storing a relationship between a full given name and a nickname that is in a data record in the data record database, and a threshold database for storing a threshold used for the comparison of the data records.
-
-
16. A method for associating a data record from an information source into a database, the database containing a plurality of data records, the method comprising:
-
receiving a data record from an information source, the received data record having a predetermined number of fields containing information about a particular entity; comparing selected fields within the received data record with corresponding fields within the data records already in the database; identifying data records already in the database, based on the comparison, having data within some of the selected fields that match to the data in the fields of the received data record as possible matching candidates, the identifying further comprising identifying errors in the data contained in one or more fields of the received data record using one or more control databases in order to correct the data in the received data record and matching the corrected data in the received data record with the data records already in the database; and scoring the identified matching candidates using a predetermined scoring criteria which measures a likelihood of a match between the received data record and the data records in the database based on the selected fields to determine if the received data record and a data record in the database contains information about the same entity thereby associating data records about the same entity despite errors contained in the data records. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
Specification