Bulk deduplication detection
First Claim
1. A computer-implemented method comprising:
- generating, by a database system, a first cluster of records from a group of records;
generating, by the database system, a second cluster of records from the group of records;
causing, by the database system, sets of duplicate records in the first cluster of records to be identified;
causing, by the database system, sets of duplicate records in the second cluster of records to be identified;
merging, by the database system, at least two sets of duplicate records associated with both the first cluster and the second cluster of records to form a merged set of duplicate records, wherein a set of duplicate records is implemented using a linked list having a head node and a body node for each record in the set of duplicate records and wherein the merging is performed based on the at least two sets of duplicate records having a common record and comprises merging a linked list associated with each set of duplicate records; and
removing, by the database system, one or more duplicate records from the merged set of duplicate records.
2 Assignments
0 Petitions
Accused Products
Abstract
Some embodiments of the present invention include a system and method for removing duplicate records from a group of records in a database system. The method includes generating a first cluster of records from the group of records, generating a second cluster of records from the group of records, identifying sets of duplicate records in the first cluster of records, and identifying sets of duplicate records in the second cluster of records. The method also includes merging at least two sets of duplicate records associated with both the first cluster and the second cluster of records to form a merged set of duplicate records. The merging is performed based on the at least two sets of duplicate records having a common record. Duplicate records in the group of records may then be removed by removing duplicate records from the merged set of duplicate records.
-
Citations
20 Claims
-
1. A computer-implemented method comprising:
-
generating, by a database system, a first cluster of records from a group of records; generating, by the database system, a second cluster of records from the group of records; causing, by the database system, sets of duplicate records in the first cluster of records to be identified; causing, by the database system, sets of duplicate records in the second cluster of records to be identified; merging, by the database system, at least two sets of duplicate records associated with both the first cluster and the second cluster of records to form a merged set of duplicate records, wherein a set of duplicate records is implemented using a linked list having a head node and a body node for each record in the set of duplicate records and wherein the merging is performed based on the at least two sets of duplicate records having a common record and comprises merging a linked list associated with each set of duplicate records; and removing, by the database system, one or more duplicate records from the merged set of duplicate records. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An apparatus for identifying duplicate records in a database object, the apparatus comprising:
-
one or more processors; and a non-transitory computer readable medium storing a plurality of instructions, which when executed, cause the one or more processors to; generate a first cluster of records from a group of records; generate a second cluster of records from the group of records; cause sets of duplicate records in the first cluster of records to be identified; cause sets of duplicate records in the second cluster of records to be identified; merge at least two sets of duplicate records associated with both the first cluster and the second cluster of records to form a merged set of duplicate records, wherein a set of duplicate records is implemented using a linked list having a head node and a body node for each record in the set of duplicate records and wherein the merging is performed based on the at least two sets of duplicate records having a common record and comprises merging a linked list associated with each set of duplicate records; and remove one or more duplicate records from the merged set of duplicate records. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer program product comprising computer-readable program code to be executed by one or more processors when retrieved from a non-transitory computer-readable medium, the program code including instructions to:
-
generate a first cluster of records from the group of records; generate a second cluster of records from the group of records; cause sets of duplicate records in the first cluster of records to be identified; cause sets of duplicate records in the second cluster of records to be identified; merge at least two sets of duplicate records associated with both the first cluster and the second cluster of records to form a merged set of duplicate records, wherein a set of duplicate records is implemented using a linked list having a head node and a body node for each record in the set of duplicate records and wherein the merging is performed based on the at least two sets of duplicate records having a common record and comprises merging a linked list associated with each set of duplicate records; and remove one or more duplicate records from the merged set of duplicate records. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification