Method and system for focused multi-blocking to increase link identification rates in record comparison
First Claim
1. A computer-implemented method comprising:
- a computer system identifying a target group of electronic customer records, each electronic customer record having data fields containing data pertaining to a customer;
the computer system, at one or more master nodes, using a mapping system to divide linking tasks into sub-tasks that are distributed to worker nodes in a multi-level tree structure for parallel processing;
the computer system, at one or more first worker nodes of the worker nodes, identifying a first focused blocker, the first focused blocker identifying a data value for an electronic customer record data field;
the computer system analyzing one or more electronic customer records from within the target group of electronic customer records to, for each electronic customer record within the target group of electronic customer records, identify one or more first focused blocker keys, the one or more first focused blocker keys comprising one or more data values from the each electronic customer record of the target group of electronic customer records corresponding to the data value for the electronic customer record data field;
the computer system further analyzing the one or more electronic customer records from within the target group of electronic customer records, and producing one or more additional focused blocker keys based from the further analysis;
the computer system associating the one or more additional focused blocker keys with the one or more electronic customer records from within the target group of electronic customer records;
the computer system, at the one or more first worker nodes of the worker nodes, analyzing the target group of electronic customer records to identify a first focused group of electronic customer records, the first focused group of electronic customer records comprising;
electronic customer records comprising a first focused blocker data value; and
electronic customer records associated with at least one of the one or more additional focused blocker keys;
the computer system, at the one or more first worker nodes of the worker nodes, comparing pairs of electronic customer records from the first focused group of electronic customer records to identify linked records, each linked record comprising two or more electronic customer records which pertain to a single customer entity;
in parallel to the computer system identifying the first focused blocker, the computer system, at one or more second worker nodes of the worker nodes, identifying a second focused blocker, the second focused blocker identifying a third data value for the electronic customer record data field, wherein the third data value is different than the data value identified by the first focused blocker for the electronic customer record data field;
in parallel to the computer system analyzing the target group of electronic customer records to identify the first focused group of electronic customer records, the computer system, at the one or more second worker nodes of the worker nodes, analyzing the target group of electronic customer records to identify a second focused group of electronic customer records, the second focused group of electronic customer records comprising electronic customer records comprising the third data value; and
in parallel to the computer system comparing the pairs of electronic customer records from the first focused group of electronic customer records, the computer system, at the one or more second worker nodes of the worker nodes, comparing pairs of electronic customer records from the second focused group of electronic customer records to identify second linked records, each linked record of the second linked records comprising two or more electronic customer records which pertain to a second single customer entity; and
the computer system, at the one or more master nodes, using a reduction system to receive and combine responses from the worker nodes for output.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques for comparing customer records to identify linked customer records pertaining to a single customer entity are provided. The techniques include identifying a target group of electronic customer records having data fields containing data pertaining to a customer, identifying one or more focused blockers identifying a data value for an electronic customer record data field, and analyzing the target group of electronic customer records to identify a focused group of electronic customer records containing the focused blocker data value. The techniques also include comparing pairs of electronic customer records from the focused group of electronic customer records to identify linked records which pertain to a single customer entity.
40 Citations
21 Claims
-
1. A computer-implemented method comprising:
-
a computer system identifying a target group of electronic customer records, each electronic customer record having data fields containing data pertaining to a customer; the computer system, at one or more master nodes, using a mapping system to divide linking tasks into sub-tasks that are distributed to worker nodes in a multi-level tree structure for parallel processing; the computer system, at one or more first worker nodes of the worker nodes, identifying a first focused blocker, the first focused blocker identifying a data value for an electronic customer record data field; the computer system analyzing one or more electronic customer records from within the target group of electronic customer records to, for each electronic customer record within the target group of electronic customer records, identify one or more first focused blocker keys, the one or more first focused blocker keys comprising one or more data values from the each electronic customer record of the target group of electronic customer records corresponding to the data value for the electronic customer record data field; the computer system further analyzing the one or more electronic customer records from within the target group of electronic customer records, and producing one or more additional focused blocker keys based from the further analysis; the computer system associating the one or more additional focused blocker keys with the one or more electronic customer records from within the target group of electronic customer records; the computer system, at the one or more first worker nodes of the worker nodes, analyzing the target group of electronic customer records to identify a first focused group of electronic customer records, the first focused group of electronic customer records comprising; electronic customer records comprising a first focused blocker data value; and electronic customer records associated with at least one of the one or more additional focused blocker keys; the computer system, at the one or more first worker nodes of the worker nodes, comparing pairs of electronic customer records from the first focused group of electronic customer records to identify linked records, each linked record comprising two or more electronic customer records which pertain to a single customer entity; in parallel to the computer system identifying the first focused blocker, the computer system, at one or more second worker nodes of the worker nodes, identifying a second focused blocker, the second focused blocker identifying a third data value for the electronic customer record data field, wherein the third data value is different than the data value identified by the first focused blocker for the electronic customer record data field; in parallel to the computer system analyzing the target group of electronic customer records to identify the first focused group of electronic customer records, the computer system, at the one or more second worker nodes of the worker nodes, analyzing the target group of electronic customer records to identify a second focused group of electronic customer records, the second focused group of electronic customer records comprising electronic customer records comprising the third data value; and in parallel to the computer system comparing the pairs of electronic customer records from the first focused group of electronic customer records, the computer system, at the one or more second worker nodes of the worker nodes, comparing pairs of electronic customer records from the second focused group of electronic customer records to identify second linked records, each linked record of the second linked records comprising two or more electronic customer records which pertain to a second single customer entity; and the computer system, at the one or more master nodes, using a reduction system to receive and combine responses from the worker nodes for output. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-implemented method comprising:
-
a computer system identifying a target group of electronic records, each electronic record having data fields containing data pertaining to one or more events; the computer system, at one or more master nodes, using a mapping system to divide linking tasks into sub-tasks that are distributed to worker nodes in a multi-level tree structure for parallel processing; the computer system, at one or more first worker nodes of the worker nodes, identifying a first focused blocker, the first focused blocker identifying a data value for an electronic record data field; the computer system analyzing one or more electronic records from within the target group of electronic records to, for each electronic customer record within the target group of electronic records, identify one or more first focused blocker keys, the one or more first focused blocker keys comprising one or more data values from the each electronic customer record of the target group of electronic records corresponding to the data value for the electronic record data field; the computer system further analyzing the one or more electronic records from within the target group of electronic records, and producing one or more additional focused blocker keys based from the further analysis; the computer system associating the one or more additional focused blocker keys with the one or more electronic records from within the target group of electronic records; the computer system, at the one or more first worker nodes of the worker nodes, analyzing the target group of electronic records to identify a first focused group of electronic records, the first focused group of electronic records comprising; electronic records comprising a first focused blocker data value; and electronic records associated with at least one of the one or more additional focused blocker keys; the computer system, at the one or more first worker nodes of the worker nodes, comparing pairs of electronic records from the first focused group of electronic records to identify linked records, each linked record comprising two or more electronic records which pertain to a single event of the one or more events; in parallel to the computer system identifying the first focused blocker, the computer system, at one or more second worker nodes of the worker nodes, identifying a second focused blocker, the second focused blocker identifying a third data value for the electronic record data field, wherein the third data value is different than the data value identified by the first focused blocker for the electronic record data field; in parallel to the computer system analyzing the target group of electronic records to identify the first focused group of electronic records, the computer system, at the one or more second worker nodes of the worker nodes, analyzing the target group of electronic records to identify a second focused group of electronic customer records, the second focused group of electronic customer records comprising electronic customer records comprising the third data value; in parallel to the computer system comparing the pairs of electronic records from the first focused group of electronic records, the computer system, at the one or more second worker nodes of the worker nodes, comparing pairs of electronic records from the second focused group of electronic customer records to identify second linked records, each linked record of the second linked records comprising two or more electronic customer records which pertain to a second single customer entity; and the computer system, at the one or more master nodes, using a reduction system to receive and combine responses from the worker nodes for output. - View Dependent Claims (11, 12)
-
-
13. A computer system comprising:
a computer system comprising a hardware processor, the computer system programmed to; identify a target group of electronic customer records, each electronic customer record having data fields containing data pertaining to a customer; use a mapping system, at one or more master nodes, to divide linking tasks into sub-tasks that are distributed to worker nodes in a multi-level tree structure for parallel processing; identify, at one or more first worker nodes of the worker nodes, a first focused blocker, the first focused blocker identifying a data value for an electronic customer record data field; analyze one or more electronic customer records from within the target group of electronic customer records to, for each electronic customer record within the target group of electronic customer records, identify one or more first focused blocker keys, the one or more first focused blocker keys comprising one or more data values from the each electronic customer record of the target group of electronic customer records corresponding to the data value for the electronic customer record data field; further analyze the one or more electronic customer records from within the target group of electronic customer records, and producing one or more additional focused blocker keys based from the further analysis; associate the one or more additional focused blocker keys with the one or more electronic customer records from within the target group of electronic customer records; analyze, at the one or more first worker nodes of the worker nodes, the target group of electronic customer records to identify a first focused group of electronic customer records, the first focused group of electronic customer records comprising; electronic customer records comprising a first focused blocker data value; and electronic customer records associated with at least one of the one or more additional focused blocker keys; compare, at the one or more first worker nodes of the worker nodes, pairs of electronic customer records from the first focused group of electronic customer records to identify linked records, each linked record comprising two or more electronic customer records which pertain to a single customer entity; in parallel to identifying the first focused blocker, identify, at one or more second worker nodes of the worker nodes, a second focused blocker, the second focused blocker identifying a third data value for the electronic customer record data field, wherein the third data value is different than the data value identified by the first focused blocker for the electronic customer record data field; in parallel to analyzing the target group of electronic customer records to identify the first focused group of electronic customer records, analyze, at the one or more second worker nodes of the worker nodes, the target group of electronic customer records to identify a second focused group of electronic customer records, the second focused group of electronic customer records comprising electronic customer records comprising the third data value; in parallel to comparing the pairs of electronic customer records from the first focused group of electronic customer records, compare, at the one or more second worker nodes of the worker nodes, pairs of electronic customer records from the second focused group of electronic customer records to identify second linked records, each linked record of the second linked records comprising two or more electronic customer records which pertain to a second single customer entity; and using a reduction system, at the one or more master nodes, to receive and combine responses from the worker nodes for output. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
Specification