Duplicate item detection system and method

US 7,827,186 B2
Filed: 09/28/2007
Issued: 11/02/2010
Est. Priority Date: 09/28/2007
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented method of detecting context-based duplicate items in an electronic catalog, the method comprising:

by a computer system comprising computer hardware;

identifying a set of candidate recommendations from a plurality of items represented in an electronic catalog from which to select items to recommend to a target user;

for each candidate recommendation in the set of candidate recommendations, identifying textual terms from a representation in the electronic catalog of the candidate recommendation, the representation comprising a product description for the candidate recommendation;

calculating degrees of fit between the textual terms of representations of first and second candidate recommendations selected from the set of candidate recommendations, the calculated degrees of fit reflecting the contextual similarities of the textual terms of the first and second item representations, wherein calculating degrees of fit comprises;

forming an initial matrix of values, each of the textual terms of the first and second item representations having a value represented in an initial matrix;

calculating a singular value decomposition of the initial matrix, the singular value decomposition comprising a left matrix, a singular value matrix, and a right transpose matrix;

reducing the dimension of one or more of the left, singular value, and right transpose matrices to create a reduced singular value decomposition; and

multiplying the matrices of the reduced singular value decomposition to create a reduced-dimension matrix approximating the initial matrix;

calculating a degree of similarity between the first and second item representations based at least in part on the calculated degrees of fit, the degree of similarity between the first and second item representations reflecting the similarity of the first and second item representations;

assessing whether the first and second candidate recommendations are contextual duplicates based at least in part on the calculated degree of similarity;

removing one of the first and second candidate recommendations from the set of candidate recommendations based at least in part on said assessing to thereby generate a modified set of candidate recommendations; and

recommending one or more items of the modified set of candidate recommendations to the target user.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of detecting contextual duplicate items can include identifying a plurality of representations of items in a data repository, each item representation including one or more textual attributes. A degree of fit between an item representation'"'"'s attributes and other items can be calculated. The degree of fit can reflect the relevance of the attributes of one item to the other item. A degree of association between the two item representations can be calculated based at least in part on the calculated degree of fit. The degree of association between the two item representations can reflect the similarity of the two items. The degree of association between the two item representations can be assessed to determine whether the items are contextual duplicates.

38 Citations

View as Search Results

14 Claims

1. A computer-implemented method of detecting context-based duplicate items in an electronic catalog, the method comprising:
- by a computer system comprising computer hardware;
  
  identifying a set of candidate recommendations from a plurality of items represented in an electronic catalog from which to select items to recommend to a target user;
  
  for each candidate recommendation in the set of candidate recommendations, identifying textual terms from a representation in the electronic catalog of the candidate recommendation, the representation comprising a product description for the candidate recommendation;
  
  calculating degrees of fit between the textual terms of representations of first and second candidate recommendations selected from the set of candidate recommendations, the calculated degrees of fit reflecting the contextual similarities of the textual terms of the first and second item representations, wherein calculating degrees of fit comprises;
  
  forming an initial matrix of values, each of the textual terms of the first and second item representations having a value represented in an initial matrix;
  
  calculating a singular value decomposition of the initial matrix, the singular value decomposition comprising a left matrix, a singular value matrix, and a right transpose matrix;
  
  reducing the dimension of one or more of the left, singular value, and right transpose matrices to create a reduced singular value decomposition; and
  
  multiplying the matrices of the reduced singular value decomposition to create a reduced-dimension matrix approximating the initial matrix;
  
  calculating a degree of similarity between the first and second item representations based at least in part on the calculated degrees of fit, the degree of similarity between the first and second item representations reflecting the similarity of the first and second item representations;
  
  assessing whether the first and second candidate recommendations are contextual duplicates based at least in part on the calculated degree of similarity;
  
  removing one of the first and second candidate recommendations from the set of candidate recommendations based at least in part on said assessing to thereby generate a modified set of candidate recommendations; and
  
  recommending one or more items of the modified set of candidate recommendations to the target user.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1, wherein the representations comprise product descriptions.
  - 3. The method of claim 1, wherein at least some of the items are apparel items.

4. A computer-implemented method of detecting context-based duplicate items, the method comprising:
- by a computer system comprising computer hardware;
  
  identifying a plurality of representations of items in a data repository;
  
  identifying one or more attributes of each item representation, each attribute comprising one or more textual terms;
  
  calculating degrees of fit between the attributes of representations of first and second items selected from the plurality of item representations, wherein calculating degrees of fit comprises;
  
  forming an initial matrix of values corresponding to the attributes of the first and second item representations; and
  
  using a singular value decomposition to reduce the dimension of the initial matrix to form a reduced-dimension matrix approximating the initial matrix, the calculated degrees of fit reflecting the contextual similarities of the attributes of the first and second item representations;
  
  calculating a degree of similarity between the first and second item representations based at least in part on the calculated degrees of fit, the degree of similarity between the first and second item representations reflecting the similarity of the first and second item representations;
  
  assessing whether the first and second items are contextual duplicates based at least in part on the calculated degree of similarity;
  
  identifying a set of candidate recommendations for a target user, the candidate recommendations comprising the first and second items; and
  
  excluding one of the first and second items from the set of candidate recommendations based at least in part on determining that the first and second items are contextual duplicates.
- View Dependent Claims (5, 6, 7, 8, 9)
- - 5. The method of claim 4, further comprising removing one of the first and second item representations from an electronic catalog in response to determining that the first and second items are contextual duplicates.
  - 6. The method of claim 4, wherein assessing whether the first and second items are contextual duplicates comprises comparing the calculated degree of similarity to a threshold.
  - 7. The method of claim 4, further comprising assessing one or more additional factors to determine whether to exclude one of the first and second items from the set of candidate recommendations.
  - 8. The method of claim 7, wherein said assessing the one or more additional factors comprises determining whether the first and second items are associated with a same browse node in the electronic catalog.
  - 9. The method of claim 8, wherein said excluding comprises excluding one of the first and second items in response to determining that the first and second items are associated with the same browse node in the electronic catalog.

10. A system for detecting similarities between items represented in a data repository, the system comprising:
- a computer system comprising computer hardware programmed to implement;
  
  an item attributes analysis component configured to;
  
  identify a plurality of representations of items in a data repository;
  
  identify one or more attributes of each item representation, each attribute comprising one or more textual terms; and
  
  calculate degrees of fit between the attributes of representations of first and second items selected from the plurality of item representations, the calculated degrees of fit reflecting the contextual similarities of the attributes of the first and second item representations; and
  
  an association analysis component configured to;
  
  calculate a degree of similarity between the first and second item representations based at least in part on the calculated degrees of fit by forming an initial matrix of values corresponding to the attributes of the first and second item representations and by using a singular value decomposition to reduce the dimension of the initial matrix to form a reduced-dimension matrix approximating the initial matrix, the degree of similarity between the first and second item representations reflecting the similarity of the first and second item representations;
  
  assess whether the first and second items are contextual duplicates based at least in part on the calculated degree of similarity; and
  
  exclude one of the first and second items from a set of recommendations for a target user based at least in part on determining that the first and second items are contextual duplicates.
- View Dependent Claims (11, 12, 13)
- - 11. The system of claim 10, wherein the textual terms are listed in a product description of each item representation.
  - 12. The system of claim 10, wherein at least some of the items are apparel items.
  - 13. The system of claim 10, wherein the computer system comprises a plurality of physical computers.

14. A computer-implemented method of assessing a degree of similarity between a first representation of an apparel item having a first textual attribute and a second representation of an apparel item having a second textual attribute, the method comprising:
- by a computer system comprising computer hardware;
  
  calculating a first degree of contextual similarity between the first apparel item representation and the second textual attribute, wherein calculating a first degree of contextual similarity comprises applying latent semantic analysis techniques to the first textual attribute and the second apparel item representation;
  
  calculating a second degree of contextual similarity between the second apparel item representation and the first textual attribute;
  
  assessing a degree of similarity between the first and second apparel items based, at least in part, on the first and second calculated degrees of contextual similarity;
  
  determining that the first and second apparel items are contextual duplicates in response to determining that the degree of contextual similarity between the first and second apparel items exceeds a threshold; and
  
  excluding the first apparel item from a set of recommendations for a target user based at least partly on said determining that the first and second apparel items are contextual duplicates and on one or more of the following additional factors;
  
  determining that the first and second apparel items are associated with a same browse node in an electronic catalog, anddetermining that the first and second representations of the first and second apparel items in the electronic catalog originated from the same originating vendor.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Hicks, Cory
Primary Examiner(s)
Ali; Mohammad
Assistant Examiner(s)
Shanmugasundaram; Kannan

Application Number

US11/863,987
Publication Number

US 20090089314A1
Time in Patent Office

1,131 Days
Field of Search

None
US Class Current

707/749
CPC Class Codes

G06Q 30/0603 Catalogue ordering

Duplicate item detection system and method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

38 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Duplicate item detection system and method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

38 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links