×

Reducing comparisons for token-based entity resolution

  • US 10,191,942 B2
  • Filed: 10/14/2016
  • Issued: 01/29/2019
  • Est. Priority Date: 10/14/2016
  • Status: Active Grant
First Claim
Patent Images

1. A system for reducing an amount of comparisons during entity resolution of records, the system comprising:

  • an in-memory database system configured to store a plurality of records; and

    token-based entity resolution circuitry configured to determine whether a current record is similar to one or more other records in the database, the token-based entity resolution circuitry including;

    a token creator configured to create tokens from the plurality of records;

    a token-record mapping creator configured to create a token-record mapping of tokens to records;

    a token importance calculator configured to calculate token importance values for the tokens, each token importance value representing a level of amount of information contained within a respective token;

    a token pruner configured to identify a token of the current record as unimportant based on token importance values of the tokens of the current record, the token pruner configured to remove the unimportant token from the token-record mapping, the identification and removal of the unimportant token comprising;

    identifying a token having a highest token importance value within the current record;

    marking at least one token as unimportant when a token importance value of the at least one token is less than a predetermined threshold relative to the highest token importance value in the current record; and

    removing the at least one unimportant token from the token-record mapping such that records linked to the at least one unimportant token are not selected for comparison with the current record; and

    a record selector configured to select only records sharing at least one common token with the current record such that the at least one common token does not include the token identified as unimportant; and

    a record comparator configured to compare the current record with each of the selected records to determine whether the current record matches any of the selected records.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×