×

Duplicate item detection system and method

  • US 7,827,186 B2
  • Filed: 09/28/2007
  • Issued: 11/02/2010
  • Est. Priority Date: 09/28/2007
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented method of detecting context-based duplicate items in an electronic catalog, the method comprising:

  • by a computer system comprising computer hardware;

    identifying a set of candidate recommendations from a plurality of items represented in an electronic catalog from which to select items to recommend to a target user;

    for each candidate recommendation in the set of candidate recommendations, identifying textual terms from a representation in the electronic catalog of the candidate recommendation, the representation comprising a product description for the candidate recommendation;

    calculating degrees of fit between the textual terms of representations of first and second candidate recommendations selected from the set of candidate recommendations, the calculated degrees of fit reflecting the contextual similarities of the textual terms of the first and second item representations, wherein calculating degrees of fit comprises;

    forming an initial matrix of values, each of the textual terms of the first and second item representations having a value represented in an initial matrix;

    calculating a singular value decomposition of the initial matrix, the singular value decomposition comprising a left matrix, a singular value matrix, and a right transpose matrix;

    reducing the dimension of one or more of the left, singular value, and right transpose matrices to create a reduced singular value decomposition; and

    multiplying the matrices of the reduced singular value decomposition to create a reduced-dimension matrix approximating the initial matrix;

    calculating a degree of similarity between the first and second item representations based at least in part on the calculated degrees of fit, the degree of similarity between the first and second item representations reflecting the similarity of the first and second item representations;

    assessing whether the first and second candidate recommendations are contextual duplicates based at least in part on the calculated degree of similarity;

    removing one of the first and second candidate recommendations from the set of candidate recommendations based at least in part on said assessing to thereby generate a modified set of candidate recommendations; and

    recommending one or more items of the modified set of candidate recommendations to the target user.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×