DUPLICATE ITEM DETECTION SYSTEM AND METHOD
First Claim
1. A computer-implemented method of detecting context-based duplicate items in an electronic catalog, the method comprising:
- identifying a plurality of representations of items in a data repository of an electronic catalog from which to select items to recommend to a target user;
identifying one or more textual terms of each item representation, each textual term listed in a product description for a given item, the one or more textual terms describing the given item;
calculating degrees of fit between the textual terms of representations of first and second items selected from the plurality of item representations, the calculated degrees of fit reflecting the contextual similarities of the textual terms of the first and second item representations, wherein calculating degrees of fit comprises;
forming an initial matrix of values, each of the textual terms of the first and second item representations having a value represented in an initial matrix;
calculating a singular value decomposition of the initial matrix, the singular value decomposition comprising a left matrix, a singular value matrix, and a right transpose matrix;
reducing the dimension of one or more of the left, singular value, and right transpose matrices to create a reduced singular value decomposition; and
multiplying the matrices of the reduced singular value decomposition to create a reduced-dimension matrix approximating the initial matrix;
calculating a degree of similarity between the first and second item representations based at least in part on the calculated degrees of fit, the degree of similarity between the first and second item representations reflecting the similarity of the first and second item representations; and
assessing whether the first and second items are contextual duplicates based at least in part on the calculated degree of similarity.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of detecting contextual duplicate items can include identifying a plurality of representations of items in a data repository, each item representation including one or more textual attributes. A degree of fit between an item representation'"'"'s attributes and other items can be calculated. The degree of fit can reflect the relevance of the attributes of one item to the other item. A degree of association between the two item representations can be calculated based at least in part on the calculated degree of fit. The degree of association between the two item representations can reflect the similarity of the two items. The degree of association between the two item representations can be assessed to determine whether the items are contextual duplicates.
26 Citations
25 Claims
-
1. A computer-implemented method of detecting context-based duplicate items in an electronic catalog, the method comprising:
-
identifying a plurality of representations of items in a data repository of an electronic catalog from which to select items to recommend to a target user; identifying one or more textual terms of each item representation, each textual term listed in a product description for a given item, the one or more textual terms describing the given item; calculating degrees of fit between the textual terms of representations of first and second items selected from the plurality of item representations, the calculated degrees of fit reflecting the contextual similarities of the textual terms of the first and second item representations, wherein calculating degrees of fit comprises; forming an initial matrix of values, each of the textual terms of the first and second item representations having a value represented in an initial matrix; calculating a singular value decomposition of the initial matrix, the singular value decomposition comprising a left matrix, a singular value matrix, and a right transpose matrix; reducing the dimension of one or more of the left, singular value, and right transpose matrices to create a reduced singular value decomposition; and multiplying the matrices of the reduced singular value decomposition to create a reduced-dimension matrix approximating the initial matrix; calculating a degree of similarity between the first and second item representations based at least in part on the calculated degrees of fit, the degree of similarity between the first and second item representations reflecting the similarity of the first and second item representations; and assessing whether the first and second items are contextual duplicates based at least in part on the calculated degree of similarity. - View Dependent Claims (2, 3, 4)
-
-
5. A computer-implemented method of detecting context-based duplicate items, the method comprising:
-
identifying a plurality of representations of items in a data repository; identifying one or more attributes of each item representation, each attribute comprising one or more textual terms; calculating degrees of fit between the attributes of representations of first and second items selected from the plurality of item representations, the calculated degrees of fit reflecting the contextual similarities of the attributes of the first and second item representations; calculating a degree of similarity between the first and second item representations based at least in part on the calculated degrees of fit, the degree of similarity between the first and second item representations reflecting the similarity of the first and second item representations; and assessing whether the first and second items are contextual duplicates based at least in part on the calculated degree of similarity. - View Dependent Claims (6, 7, 8, 9, 10)
-
-
11. A computer system for detecting similarities between items represented in a data repository, the system comprising:
-
an item attributes analysis component configured to; identify a plurality of representations of items in a data repository; identify one or more attributes of each item representation, each attribute comprising one or more textual terms; and calculate degrees of fit between the attributes of representations of first and second items selected from the plurality of item representations, the calculated degrees of fit reflecting the contextual similarities of the attributes of the first and second item representations; and an association analysis component configured to; calculate a degree of similarity between the first and second item representations based at least in part on the calculated degrees of fit, the degree of similarity between the first and second item representations reflecting the similarity of the first and second item representations; and assess whether the first and second items are contextual duplicates based at least in part on the calculated degree of similarity. - View Dependent Claims (12, 13, 14)
-
-
15. A computer-implemented method of detecting context-based similarities, the method comprising:
-
identifying a plurality of representations of items in a data repository; identifying one or more attributes of each item representation, each attribute comprising one or more textual terms describing an item; calculating degrees of fit between the attributes of representations of first and second items selected from the plurality of item representations, the calculated degrees of fit reflecting the contextual similarities of the attributes of the first and second item representations; calculating a degree of similarity between the first and second item representations based at least in part on the calculated degrees of fit, the degree of similarity between the first and second item representations reflecting the similarity of the first and second items; and storing the calculated degree of similarity in computer storage. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
-
22. A computer-implemented method of assessing a degree of similarity between a first representation of an apparel item having a first textual attribute and a second representation of an apparel item having a second textual attribute, the method comprising:
-
calculating a first degree of contextual similarity between the first apparel item representation and the second textual attribute; calculating a second degree of contextual similarity between the second apparel item representation and the first textual attribute; and assessing a degree of similarity between the first and second apparel items based, at least in part, on the first and second calculated degrees of contextual similarity. - View Dependent Claims (23, 24, 25)
-
Specification