×

DUPLICATE ITEM DETECTION SYSTEM AND METHOD

  • US 20090089314A1
  • Filed: 09/28/2007
  • Published: 04/02/2009
  • Est. Priority Date: 09/28/2007
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method of detecting context-based duplicate items in an electronic catalog, the method comprising:

  • identifying a plurality of representations of items in a data repository of an electronic catalog from which to select items to recommend to a target user;

    identifying one or more textual terms of each item representation, each textual term listed in a product description for a given item, the one or more textual terms describing the given item;

    calculating degrees of fit between the textual terms of representations of first and second items selected from the plurality of item representations, the calculated degrees of fit reflecting the contextual similarities of the textual terms of the first and second item representations, wherein calculating degrees of fit comprises;

    forming an initial matrix of values, each of the textual terms of the first and second item representations having a value represented in an initial matrix;

    calculating a singular value decomposition of the initial matrix, the singular value decomposition comprising a left matrix, a singular value matrix, and a right transpose matrix;

    reducing the dimension of one or more of the left, singular value, and right transpose matrices to create a reduced singular value decomposition; and

    multiplying the matrices of the reduced singular value decomposition to create a reduced-dimension matrix approximating the initial matrix;

    calculating a degree of similarity between the first and second item representations based at least in part on the calculated degrees of fit, the degree of similarity between the first and second item representations reflecting the similarity of the first and second item representations; and

    assessing whether the first and second items are contextual duplicates based at least in part on the calculated degree of similarity.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×