Duplicate account identification and scoring
First Claim
Patent Images
1. A method performed by a computer system, the method comprising:
- receiving, using a communication interface associated with the computer system, a trigger event for a first user account;
receiving, using a communication interface associated with the computer system, a string related to the first account;
converting, using one or more processors associated with the computer system, the string into a histogram of characters of the string;
converting, using one or more processors associated with the computer system, the histogram of characters into a histogram of bins using a character to bin mapping;
appending, using one or more processors associated with the computer system, a length of the string as a last bin of the histogram of bins;
calculating, using one or more processors associated with the computer system, a particular number of nearest neighbors for the histogram of bins in a kd-tree;
calculating, using one or more processors associated with the computer system, a string edit distance between the histogram of bins and a particular neighbor of the calculated nearest neighbors;
if the string edit distance is greater than a preset threshold for the particular neighbor, designating a match between the first user account and a second user account that is associated with the particular neighbor, to create a matched user account pair, where the designating is performed using one or more processors associated with the computer system; and
scoring, using one or more processors associated with the computer system, the matched user account pair.
2 Assignments
0 Petitions
Accused Products
Abstract
A system matches accounts based on attributes of the accounts, and scores the matched account pairs based on a probability of the matched accounts being duplicate accounts. The system can utilize the matched and scored account pairs to determine duplicate accounts, and terminate at least one of the accounts in a duplicate account pair.
57 Citations
39 Claims
-
1. A method performed by a computer system, the method comprising:
-
receiving, using a communication interface associated with the computer system, a trigger event for a first user account; receiving, using a communication interface associated with the computer system, a string related to the first account; converting, using one or more processors associated with the computer system, the string into a histogram of characters of the string; converting, using one or more processors associated with the computer system, the histogram of characters into a histogram of bins using a character to bin mapping; appending, using one or more processors associated with the computer system, a length of the string as a last bin of the histogram of bins; calculating, using one or more processors associated with the computer system, a particular number of nearest neighbors for the histogram of bins in a kd-tree; calculating, using one or more processors associated with the computer system, a string edit distance between the histogram of bins and a particular neighbor of the calculated nearest neighbors; if the string edit distance is greater than a preset threshold for the particular neighbor, designating a match between the first user account and a second user account that is associated with the particular neighbor, to create a matched user account pair, where the designating is performed using one or more processors associated with the computer system; and scoring, using one or more processors associated with the computer system, the matched user account pair. - View Dependent Claims (2, 3, 4, 5, 6, 7, 29, 30)
-
-
8. A computer system comprising:
one or more server devices comprising; a backend matching unit that; receives a string related to a first account, converts the string into a histogram of characters of the string, converts the histogram of characters into a histogram of bins using a character to bin mapping, appends a length of the string as a last bin to the histogram of bins, calculates a string edit distance between the histogram of bins and a neighbor in a kd-tree, if the string edit distance is greater than a threshold, designates a matched account pair of the first account and a second account associated with the neighbor, and assigns scores to the matched account pair; and a prioritization reviewing unit that receives matched account pairs and the scores of the pairs, and prioritizes the matched account pairs based on the scores. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
27. A system comprising:
one or more devices comprising; a memory to store accounts; means for receiving trigger events for the accounts; means for mapping characters to bins; means for converting a string associated with one of the accounts into a histogram of bins based on the mapping; means for appending a length of the string to the histogram of bins as an additional bin of the histogram of bins; means for determining a string edit distance between the histogram of bins and a neighbor in a kd-tree; means for matching the one of the accounts to another one of the accounts based on the string edit distance being greater than a threshold where the other one of the accounts is associated with the neighbor in the kd-tree; means for scoring the matched accounts; and means for utilizing the matched and scored account pair to determine a duplicate account. - View Dependent Claims (31, 32, 33, 34)
-
28. A system comprising:
-
a memory to store a plurality of instructions; and a processor to execute instructions in the memory to; match accounts based on attributes of the accounts, retrieve strings associated with the attributes, convert strings into histograms of characters, convert the histograms of characters into histograms of bins based on a character-to-bin mapping, append a histogram of bins with an extra bin representing a length of the string associated with the histogram, calculate a string edit distance between the histogram of bins and a neighbor in a kd-tree, where the string is associated with an attribute of a first account and where the neighbor is associated with a corresponding attribute of a second account, score the first account and the second account based on the string edit distance, utilize the scored account pair to determine whether the first account pair and the second account pair comprise duplicate accounts, and terminate at least one of the first account or the second account. - View Dependent Claims (35, 36)
-
-
37. One or more memory devices storing instructions executable by one or more processors, the one or more memory devices comprising:
-
one or more instructions to receive a trigger event for a first user account; one or more instructions to receive a string related to the first account; one or more instructions to convert the string into a histogram of characters of the string; one or more instructions to convert the histogram of characters into a histogram of bins using a character to bin mapping; one or more instructions to append a length of the string as a last bin of the histogram of bins; one or more instructions to calculate a particular number of nearest neighbors for the histogram of bins in a kd-tree; one or more instructions to calculate a string edit distance between the histogram of bins and a particular neighbor of the calculated nearest neighbors; one or more instructions to, if the string edit distance is greater than a preset threshold for the particular neighbor, designate a match between the first user account and a second user account that is associated with the particular neighbor, to create a matched user account pair; and one or more instructions to score the matched user account pair. - View Dependent Claims (38, 39)
-
Specification