Duplicate item detection system and method
First Claim
1. A computer-implemented method of detecting context-based duplicate items in an electronic catalog, the method comprising:
- by a computer system comprising computer hardware;
identifying a set of candidate recommendations from a plurality of items represented in an electronic catalog from which to select items to recommend to a target user;
for each candidate recommendation in the set of candidate recommendations, identifying textual terms from a representation in the electronic catalog of the candidate recommendation, the representation comprising a product description for the candidate recommendation;
calculating degrees of fit between the textual terms of representations of first and second candidate recommendations selected from the set of candidate recommendations, the calculated degrees of fit reflecting the contextual similarities of the textual terms of the first and second item representations, wherein calculating degrees of fit comprises;
forming an initial matrix of values, each of the textual terms of the first and second item representations having a value represented in an initial matrix;
calculating a singular value decomposition of the initial matrix, the singular value decomposition comprising a left matrix, a singular value matrix, and a right transpose matrix;
reducing the dimension of one or more of the left, singular value, and right transpose matrices to create a reduced singular value decomposition; and
multiplying the matrices of the reduced singular value decomposition to create a reduced-dimension matrix approximating the initial matrix;
calculating a degree of similarity between the first and second item representations based at least in part on the calculated degrees of fit, the degree of similarity between the first and second item representations reflecting the similarity of the first and second item representations;
assessing whether the first and second candidate recommendations are contextual duplicates based at least in part on the calculated degree of similarity;
removing one of the first and second candidate recommendations from the set of candidate recommendations based at least in part on said assessing to thereby generate a modified set of candidate recommendations; and
recommending one or more items of the modified set of candidate recommendations to the target user.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of detecting contextual duplicate items can include identifying a plurality of representations of items in a data repository, each item representation including one or more textual attributes. A degree of fit between an item representation'"'"'s attributes and other items can be calculated. The degree of fit can reflect the relevance of the attributes of one item to the other item. A degree of association between the two item representations can be calculated based at least in part on the calculated degree of fit. The degree of association between the two item representations can reflect the similarity of the two items. The degree of association between the two item representations can be assessed to determine whether the items are contextual duplicates.
38 Citations
14 Claims
-
1. A computer-implemented method of detecting context-based duplicate items in an electronic catalog, the method comprising:
by a computer system comprising computer hardware; identifying a set of candidate recommendations from a plurality of items represented in an electronic catalog from which to select items to recommend to a target user; for each candidate recommendation in the set of candidate recommendations, identifying textual terms from a representation in the electronic catalog of the candidate recommendation, the representation comprising a product description for the candidate recommendation; calculating degrees of fit between the textual terms of representations of first and second candidate recommendations selected from the set of candidate recommendations, the calculated degrees of fit reflecting the contextual similarities of the textual terms of the first and second item representations, wherein calculating degrees of fit comprises; forming an initial matrix of values, each of the textual terms of the first and second item representations having a value represented in an initial matrix; calculating a singular value decomposition of the initial matrix, the singular value decomposition comprising a left matrix, a singular value matrix, and a right transpose matrix; reducing the dimension of one or more of the left, singular value, and right transpose matrices to create a reduced singular value decomposition; and multiplying the matrices of the reduced singular value decomposition to create a reduced-dimension matrix approximating the initial matrix; calculating a degree of similarity between the first and second item representations based at least in part on the calculated degrees of fit, the degree of similarity between the first and second item representations reflecting the similarity of the first and second item representations; assessing whether the first and second candidate recommendations are contextual duplicates based at least in part on the calculated degree of similarity; removing one of the first and second candidate recommendations from the set of candidate recommendations based at least in part on said assessing to thereby generate a modified set of candidate recommendations; and recommending one or more items of the modified set of candidate recommendations to the target user. - View Dependent Claims (2, 3)
-
4. A computer-implemented method of detecting context-based duplicate items, the method comprising:
by a computer system comprising computer hardware; identifying a plurality of representations of items in a data repository; identifying one or more attributes of each item representation, each attribute comprising one or more textual terms; calculating degrees of fit between the attributes of representations of first and second items selected from the plurality of item representations, wherein calculating degrees of fit comprises;
forming an initial matrix of values corresponding to the attributes of the first and second item representations; and
using a singular value decomposition to reduce the dimension of the initial matrix to form a reduced-dimension matrix approximating the initial matrix, the calculated degrees of fit reflecting the contextual similarities of the attributes of the first and second item representations;calculating a degree of similarity between the first and second item representations based at least in part on the calculated degrees of fit, the degree of similarity between the first and second item representations reflecting the similarity of the first and second item representations; assessing whether the first and second items are contextual duplicates based at least in part on the calculated degree of similarity; identifying a set of candidate recommendations for a target user, the candidate recommendations comprising the first and second items; and excluding one of the first and second items from the set of candidate recommendations based at least in part on determining that the first and second items are contextual duplicates. - View Dependent Claims (5, 6, 7, 8, 9)
-
10. A system for detecting similarities between items represented in a data repository, the system comprising:
a computer system comprising computer hardware programmed to implement; an item attributes analysis component configured to; identify a plurality of representations of items in a data repository; identify one or more attributes of each item representation, each attribute comprising one or more textual terms; and calculate degrees of fit between the attributes of representations of first and second items selected from the plurality of item representations, the calculated degrees of fit reflecting the contextual similarities of the attributes of the first and second item representations; and an association analysis component configured to; calculate a degree of similarity between the first and second item representations based at least in part on the calculated degrees of fit by forming an initial matrix of values corresponding to the attributes of the first and second item representations and by using a singular value decomposition to reduce the dimension of the initial matrix to form a reduced-dimension matrix approximating the initial matrix, the degree of similarity between the first and second item representations reflecting the similarity of the first and second item representations; assess whether the first and second items are contextual duplicates based at least in part on the calculated degree of similarity; and exclude one of the first and second items from a set of recommendations for a target user based at least in part on determining that the first and second items are contextual duplicates. - View Dependent Claims (11, 12, 13)
-
14. A computer-implemented method of assessing a degree of similarity between a first representation of an apparel item having a first textual attribute and a second representation of an apparel item having a second textual attribute, the method comprising:
by a computer system comprising computer hardware; calculating a first degree of contextual similarity between the first apparel item representation and the second textual attribute, wherein calculating a first degree of contextual similarity comprises applying latent semantic analysis techniques to the first textual attribute and the second apparel item representation; calculating a second degree of contextual similarity between the second apparel item representation and the first textual attribute; assessing a degree of similarity between the first and second apparel items based, at least in part, on the first and second calculated degrees of contextual similarity; determining that the first and second apparel items are contextual duplicates in response to determining that the degree of contextual similarity between the first and second apparel items exceeds a threshold; and excluding the first apparel item from a set of recommendations for a target user based at least partly on said determining that the first and second apparel items are contextual duplicates and on one or more of the following additional factors; determining that the first and second apparel items are associated with a same browse node in an electronic catalog, and determining that the first and second representations of the first and second apparel items in the electronic catalog originated from the same originating vendor.
Specification