Boolean rule-based system for clustering similar records
First Claim
Patent Images
1. A system for identifying similar records, said system comprising:
- a collection of records, each said record in said collection having a list of fields and data contained in each said field;
a set of Boolean rules for operating upon the data in each said field;
a cell list structure generated from said collection of records, said cell list structure having a list of cells for each field and a list of pointers to each said cell of said list of cells for each said record, said set of Boolean rules identifying the similar records from said cell list structure.
1 Assignment
0 Petitions
Accused Products
Abstract
A system identifies similar records. The system includes a collection of records, a set of Boolean rules, and a cell list structure. Each record in the collection has a list of fields and data contained in each field. The set of Boolean rules operate upon the data in each field. The cell list structure is generated from the collection of records. The cell list structure has a list of cells for each field and a list of pointers to each cell of the list of cells for each record. The set of Boolean rules identifies the similar records from the cell list structure.
-
Citations
10 Claims
-
1. A system for identifying similar records, said system comprising:
-
a collection of records, each said record in said collection having a list of fields and data contained in each said field;
a set of Boolean rules for operating upon the data in each said field;
a cell list structure generated from said collection of records, said cell list structure having a list of cells for each field and a list of pointers to each said cell of said list of cells for each said record, said set of Boolean rules identifying the similar records from said cell list structure. - View Dependent Claims (2, 3)
-
-
4. A method for cleansing electronic data, said method comprising the steps of:
-
inputting a collection of records, each record in the collection representing an entity having a list of fields and data contained in each of the fields;
selecting a plurality of Boolean clustering rules for operating upon the data in each field in each record;
generating a list of clusters by applying the plurality of Boolean clustering rules to the collection of records, the list of clusters comprising a list of candidate duplicate records determined by the plurality of Boolean clustering rules; and
outputting the list of clusters. - View Dependent Claims (5, 6, 7)
-
-
8. A system for identifying similar records, said system comprising:
-
a collection of records, each said record in said collection having a list of fields and data contained in each said field;
a set of fuzzy logic rules for operating upon the data in each said field;
a cell list structure generated from said collection of records, said cell list structure having a list of cells for each field and a list of pointers to each said cell of said list of cells for each said record, said set of fuzzy logic rules identifying the similar records from said cell list structure. - View Dependent Claims (9, 10)
-
Specification