APPARATUS, SYSTEMS, AND METHODS FOR GROUPING DATA RECORDS
First Claim
Patent Images
1. An apparatus comprising:
- a processor configured to run one or more modules stored in memory, wherein the one or more modules are configured to;
identify at least one pair of data records for which to determine a similarity value;
determine the similarity value for the at least one pair of data records based, at least in part, on a plurality of attributes associated with the at least one pair of data records; and
associate the at least one pair of data records with one or more clusters, each associated with a unique entity, based on the similarity value for the at least one pair of data records.
5 Assignments
0 Petitions
Accused Products
Abstract
The present application relates to apparatus, systems, and methods for grouping data records based on entities referenced by the data records. The disclosed grouping mechanism can include determining a pair-wise similarity between a large number of data records, and clustering a subset of the data records based on their pair-wise similarity.
71 Citations
25 Claims
-
1. An apparatus comprising:
a processor configured to run one or more modules stored in memory, wherein the one or more modules are configured to; identify at least one pair of data records for which to determine a similarity value; determine the similarity value for the at least one pair of data records based, at least in part, on a plurality of attributes associated with the at least one pair of data records; and associate the at least one pair of data records with one or more clusters, each associated with a unique entity, based on the similarity value for the at least one pair of data records. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
14. A method for clustering a plurality of data records into at least one cluster, the method comprising:
-
identifying, at a candidate reduction module in a computing device, at least one pair of the plurality of data records for which to determine a similarity value; determining, at a similarity computation module residing in the computing device, in communication with the candidate reduction module, the similarity value for the at least one pair based, at least in part, on a plurality of attributes associated with the at least one pair of data records; and associating, at a clustering computation module residing in the computing device, in communication with the similarity computation module, the at least one pair of data records with one or more clusters, each associated with a unique entity, based on the similarity value for the at least one pair of data records. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. A computer program product, tangibly embodied in a non-transitory computer-readable storage medium, the computer program product including instructions operable to cause a data processing system to:
-
identify at least one pair of data records for which to determine a similarity value; determine the similarity value for the at least one pair of data records based, at least in part, on a plurality of attributes associated with the at least one pair of data records; and associate the at least one pair of data records with one or more clusters, each associated with a unique entity, based on the similarity value for the at least one pair of data records. - View Dependent Claims (21, 22, 23, 24, 25)
-
Specification