×

Duplicate account identification and scoring

  • US 7,725,421 B1
  • Filed: 07/26/2006
  • Issued: 05/25/2010
  • Est. Priority Date: 07/26/2006
  • Status: Active Grant
First Claim
Patent Images

1. A method performed by a computer system, the method comprising:

  • receiving, using a communication interface associated with the computer system, a trigger event for a first user account;

    receiving, using a communication interface associated with the computer system, a string related to the first account;

    converting, using one or more processors associated with the computer system, the string into a histogram of characters of the string;

    converting, using one or more processors associated with the computer system, the histogram of characters into a histogram of bins using a character to bin mapping;

    appending, using one or more processors associated with the computer system, a length of the string as a last bin of the histogram of bins;

    calculating, using one or more processors associated with the computer system, a particular number of nearest neighbors for the histogram of bins in a kd-tree;

    calculating, using one or more processors associated with the computer system, a string edit distance between the histogram of bins and a particular neighbor of the calculated nearest neighbors;

    if the string edit distance is greater than a preset threshold for the particular neighbor, designating a match between the first user account and a second user account that is associated with the particular neighbor, to create a matched user account pair, where the designating is performed using one or more processors associated with the computer system; and

    scoring, using one or more processors associated with the computer system, the matched user account pair.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×