×

Method and system for focused multi-blocking to increase link identification rates in record comparison

  • US 9,760,654 B2
  • Filed: 04/26/2013
  • Issued: 09/12/2017
  • Est. Priority Date: 04/26/2013
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method comprising:

  • a computer system identifying a target group of electronic customer records, each electronic customer record having data fields containing data pertaining to a customer;

    the computer system, at one or more master nodes, using a mapping system to divide linking tasks into sub-tasks that are distributed to worker nodes in a multi-level tree structure for parallel processing;

    the computer system, at one or more first worker nodes of the worker nodes, identifying a first focused blocker, the first focused blocker identifying a data value for an electronic customer record data field;

    the computer system analyzing one or more electronic customer records from within the target group of electronic customer records to, for each electronic customer record within the target group of electronic customer records, identify one or more first focused blocker keys, the one or more first focused blocker keys comprising one or more data values from the each electronic customer record of the target group of electronic customer records corresponding to the data value for the electronic customer record data field;

    the computer system further analyzing the one or more electronic customer records from within the target group of electronic customer records, and producing one or more additional focused blocker keys based from the further analysis;

    the computer system associating the one or more additional focused blocker keys with the one or more electronic customer records from within the target group of electronic customer records;

    the computer system, at the one or more first worker nodes of the worker nodes, analyzing the target group of electronic customer records to identify a first focused group of electronic customer records, the first focused group of electronic customer records comprising;

    electronic customer records comprising a first focused blocker data value; and

    electronic customer records associated with at least one of the one or more additional focused blocker keys;

    the computer system, at the one or more first worker nodes of the worker nodes, comparing pairs of electronic customer records from the first focused group of electronic customer records to identify linked records, each linked record comprising two or more electronic customer records which pertain to a single customer entity;

    in parallel to the computer system identifying the first focused blocker, the computer system, at one or more second worker nodes of the worker nodes, identifying a second focused blocker, the second focused blocker identifying a third data value for the electronic customer record data field, wherein the third data value is different than the data value identified by the first focused blocker for the electronic customer record data field;

    in parallel to the computer system analyzing the target group of electronic customer records to identify the first focused group of electronic customer records, the computer system, at the one or more second worker nodes of the worker nodes, analyzing the target group of electronic customer records to identify a second focused group of electronic customer records, the second focused group of electronic customer records comprising electronic customer records comprising the third data value; and

    in parallel to the computer system comparing the pairs of electronic customer records from the first focused group of electronic customer records, the computer system, at the one or more second worker nodes of the worker nodes, comparing pairs of electronic customer records from the second focused group of electronic customer records to identify second linked records, each linked record of the second linked records comprising two or more electronic customer records which pertain to a second single customer entity; and

    the computer system, at the one or more master nodes, using a reduction system to receive and combine responses from the worker nodes for output.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×