Robust matching for identity screening
First Claim
1. A system comprising:
- one or more hardware processors;
multiple tokenizers configured to tokenize, by the one or more hardware processors and based at least in part on an identified region, a query string to receive query tokens;
a transformation provider configured to;
generate, by the one or more hardware processors, one or more transformation rules based at least in part on the identified region and the query tokens;
rank, by the one or more hardware processors, the one or more transformation rules based at least in part on the identified region and the query tokens;
select, by the one or more hardware processors, one or more of the transformation rules, wherein a number of the one or more transformation rules selected is based on the rank of the one or more transformation rules and a tolerated risk value; and
transform, by the one or more hardware processors and based at least in part on the selected one or more transformation rules, the query tokens to obtain a query record including transformed tokens that account for regional token variations, the regional token variations being associated with the identified region; and
a token weight provider configured to assign, by the one or more hardware processors, token weights for the transformed tokens of the query record based at least in part on the identified region;
a comparer configured to determine, by the one or more hardware processors and based at least in part on the token weights, similarity values between the transformed tokens of the query record and a reference record.
1 Assignment
0 Petitions
Accused Products
Abstract
The techniques described herein are directed to robust matching for identity screening. In some examples, the techniques can include generating a similarity score for received identity information compared to a reference record. In some examples, the techniques can utilize a region associated with the received identity information to weight tokens composing the identity information or of the reference record to adjust the similarity score. Moreover, the techniques can include multiple tokenizers, transformation providers, and token weight providers and the techniques can be configured to select between the multiple tokenizers, transformation providers, and token weight providers based at least in part on a region to improve the accuracy of the similarity score. The techniques can determine whether or not to flag or otherwise affirm an identity of an individual or entity associated with the entity information based at least in part on the similarity score.
41 Citations
18 Claims
-
1. A system comprising:
-
one or more hardware processors; multiple tokenizers configured to tokenize, by the one or more hardware processors and based at least in part on an identified region, a query string to receive query tokens; a transformation provider configured to; generate, by the one or more hardware processors, one or more transformation rules based at least in part on the identified region and the query tokens; rank, by the one or more hardware processors, the one or more transformation rules based at least in part on the identified region and the query tokens; select, by the one or more hardware processors, one or more of the transformation rules, wherein a number of the one or more transformation rules selected is based on the rank of the one or more transformation rules and a tolerated risk value; and transform, by the one or more hardware processors and based at least in part on the selected one or more transformation rules, the query tokens to obtain a query record including transformed tokens that account for regional token variations, the regional token variations being associated with the identified region; and a token weight provider configured to assign, by the one or more hardware processors, token weights for the transformed tokens of the query record based at least in part on the identified region; a comparer configured to determine, by the one or more hardware processors and based at least in part on the token weights, similarity values between the transformed tokens of the query record and a reference record. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method comprising:
-
identifying a first region associated with a query string and a second region associated with the query string; based at least in part on the identified first region and a language associated with the identified first region; selecting, from multiple tokenizers, a first tokenizer associated with the identified first region or the language of the identified first region; and generating a first set of transformation rules based at least in part on the identified first region and the language of the identified first region; based at least in part on the identified second region and a language associated with the identified second region; selecting, from the multiple tokenizers, a second tokenizer associated with the identified second region or the language of the identified second region; and generating a second set of transformation rules based on the identified second region and the language of the identified second region; tokenizing the query string by the first tokenizer to receive query tokens and the query string by the second tokenizer to receive additional query tokens; transforming, by one or more transformation rules of the first set of transformation rules, the query tokens to form a query record; transforming, by one or more transformation rules of the second set of transformation rules, the additional query tokens to form an addendum to the query record; weighting tokens of the query record and weighting tokens of a reference record based at least in part on frequencies with which the tokens of the query record appear in the reference record and frequencies with which the tokens of the reference record appear in the reference record, respectively; and weighting tokens of the addendum based at least in part on frequencies with which the tokens of the addendum appear in the reference record. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A method comprising:
-
tokenizing, based at least in part on an identified region, a query string to receive query tokens; generating one or more transformation rules based at least in part on the identified region and the query tokens; ranking, by the one or more hardware processors, the one or more transformation rules based at least in part on the identified region and the query tokens; selecting one or more of the transformation rules, wherein a number of the one or more transformation rules selected is based on the rank of the one or more transformation rules and a tolerated risk value; transforming, based at least in part on the selected one or more transformation rules, the query tokens to obtain a query record including transformed tokens that account for regional token variations, the regional token variations being associated with the identified region; assigning token weights for the transformed tokens of the query record based at least in part on the identified region; determining, based at least in part on the token weights, similarity values between the transformed tokens of the query record and a reference record; receiving an identification; determining a similarity score corresponding to a similarity between the received identification and an entity in the reference record, the similarity score being weighted based at least in part on an identified region associated with the reference record and regional lingual traits associated with the identified region, and the similarity score exceeding a score threshold; affirming that the received identification corresponds to an entity associated with the identification in the reference record based at least in part on the similarity score; and flagging the received identification based at least in part on the affirmation. - View Dependent Claims (17, 18)
-
Specification