Search clustering

US 8,131,722 B2
Filed: 06/29/2007
Issued: 03/06/2012
Est. Priority Date: 11/20/2006
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

calculating a demand factor based on relationships of items and categories to query terms of search queries, the relationships established from user actions resulting from the search queries;

calculating a relevance score using the demand factor, the relevance score calculated, in part, based on a comparison of a similarity of a demand category histogram and a supply category histogram;

identifying noise data using the demand factor;

retrieving, from a plurality of listings, item data filtered from the noise data;

constructing, using a processor, at least one base cluster having at least one document with common item data stored in a suffix ordering;

compacting the at least one base cluster to create a compacted cluster representation having a reduced duplicate suffix ordering amongst the clusters; and

merging the compact cluster representation to generate a merged cluster, the merging based upon a first overlap value applied to the at least one document with common item data, the merged cluster being based at least in part on the demand factor.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In one example embodiment, a method is illustrated as including retrieving item data from a plurality of listings, the item data filtered from noise data, constructing at least one base cluster having at least one document with common item data stored in a suffix ordering, compacting the at least one base cluster to create a compacted cluster representation having a reduced duplicate suffix ordering amongst the clusters, and merging the compact cluster representation to generate a merged cluster, the merging based upon a first overlap value applied to the at least one document with common item data.

Citations

23 Claims

1. A method comprising:
- calculating a demand factor based on relationships of items and categories to query terms of search queries, the relationships established from user actions resulting from the search queries;
  
  calculating a relevance score using the demand factor, the relevance score calculated, in part, based on a comparison of a similarity of a demand category histogram and a supply category histogram;
  
  identifying noise data using the demand factor;
  
  retrieving, from a plurality of listings, item data filtered from the noise data;
  
  constructing, using a processor, at least one base cluster having at least one document with common item data stored in a suffix ordering;
  
  compacting the at least one base cluster to create a compacted cluster representation having a reduced duplicate suffix ordering amongst the clusters; and
  
  merging the compact cluster representation to generate a merged cluster, the merging based upon a first overlap value applied to the at least one document with common item data, the merged cluster being based at least in part on the demand factor.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein the item data includes at least one of an item title, an item category, or seller information.
  - 3. The method of claim 1, wherein the item data is part of a plurality of merged clusters organized into a hierarchy of merged clusters.
  - 4. The method of claim 1, further comprising filtering the item data from the noise data based upon a frequency with which a word is used in a search as compared to a frequency another word is used in the search.
  - 5. The method of claim 1, wherein the suffix ordering is stored in a data structure that includes at least one of a trie, a hash table, a binary search tree, a red-black tree, or a heap.
  - 6. The method of claim 1, further comprising merging the compact cluster representation based upon one or more of a relevance-weight factor, a seller factor, a price factor, a category factor, or an image factor.
  - 7. The method of claim 1, further comprising labeling the at least one base cluster.
  - 8. The method of claim 1, further comprising evaluating the merged cluster to determine a coverage value for the merged cluster, and a second overlap value relating to the at least one documents contained within the merged cluster.
  - 9. The method of claim 8, further comprising merging the compacted cluster representation to generate a merged cluster based upon the coverage value and the second overlap value.
  - 10. The method of claim 1, further comprising:
    - receiving a search query, the search query relating to item data; and
      
      extracting the item data from the merged cluster as a search result, the search result extracted based upon a similarity between the search query and the item data.
  - 11. The method of claim 1, further comprising removing the noise data identified based on the demand factor from the search queries.

12. A computer system comprising:
- at least one processor;
  
  a demand data engine to calculate a demand factor based on relationships of items and categories to query terms of search queries and further to identify noise data using the demand factor, the relationships established from user actions resulting from the search queries;
  
  a calculator to calculate a relevance score using the demand factor, the relevance score calculated, in part, based on a comparison of a similarity of a demand category histogram and a supply category histogram;
  
  a retrieving engine to retrieve, from a plurality of listings, item data filtered from the noise data;
  
  a cluster generator to construct, using the at least one processor, at least one base cluster having at least one document with common item data stored in a suffix ordering;
  
  a compacting engine to compact the at least one base cluster to create a compacted cluster representation having a reduced duplicate suffix ordering amongst the clusters; and
  
  a first merging engine to merge the compact cluster representation to generate a merged cluster, the merging based upon a first overlap value applied to the at least one document with common item data, the merged cluster being based at least in part on the demand factor.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 13. The computer system of claim 12, wherein the item data includes at least one of an item title, an item category, or seller information.
  - 14. The computer system of claim 12, wherein the item data is part of a plurality of merged clusters organized into a hierarchy of merged clusters.
  - 15. The computer system of claim 12, further comprising a separating engine to filter the item data from the noise data based upon a frequency with which a word is used in a search as compared to a frequency another word is used in the search.
  - 16. The computer system of claim 12, wherein the suffix ordering is stored in a data structure that includes at least one of a trie, a hash table, a binary search tree, a red-black tree, or a heap.
  - 17. The computer system of claim 12, further comprising a second merging engine to merge the compact cluster representation based upon one or more of a relevance-weight factor, a seller factor, a price factor, a category factor, or an image factor.
  - 18. The computer system of claim 12, further comprising a labeling engine to label the at least one base cluster.
  - 19. The computer system of claim 12, further comprising an evaluation engine to evaluate the merged cluster to determine a coverage value for the merged cluster, and a second overlap value relating to the at least one documents contained within the merged cluster.
  - 20. The computer system of claim 19, further comprising a third merging engine to merge the compacted cluster representation to generate a merged cluster based upon the coverage value and the second overlap value.
  - 21. The computer system of claim 12, further comprising:
    - a receiver to receive a search query, the search query relating to item data; and
      
      an extractor to extract the item data from the merged cluster as a search result, the search result extracted based upon a similarity between the search query and the item data.

22. An apparatus comprising:
- at least one processor;
  
  means for calculating a demand factor based on relationships of items and categories to query terms of search queries, the relationships established from user actions resulting from the search queries;
  
  means for calculating a relevance score using the demand factor, the relevance score calculated, in part, based on a comparison of a similarity of a demand category histogram and a supply category histogram;
  
  means for identifying noise data using the demand factor;
  
  means for retrieving, from a plurality of listings, item data filtered from the noise data;
  
  means for constructing, using the at least one processor, at least one base cluster having at least one document with common item data stored in a suffix ordering;
  
  means for compacting the at least one base cluster to create a compacted cluster representation having a reduced duplicate suffix ordering amongst the clusters; and
  
  means for merging the compact cluster representation to generate a merged cluster, the merging based upon a first overlap value applied to the at least one document with common item data, the merged cluster being based at least in part on a the demand factor.

23. A non-transitory machine-readable storage medium comprising instructions, which when implemented by one or more processors of a machine cause the machine to perform operations comprising:
- calculating a demand factor based on relationships of items and categories to query terms of search queries, the relationships established from user actions resulting from the search queries;
  
  calculating a relevance score using the demand factor, the relevance score calculated, in part, based on a comparison of a similarity of a demand category histogram and a supply category histogram;
  
  identifying noise data using the demand factor;
  
  retrieving, from a plurality of listings, item data filtered from the noise data;
  
  constructing at least one base cluster having at least one document with common item data stored in a suffix ordering;
  
  compacting the at least one base cluster to create a compacted cluster representation having a reduced duplicate suffix ordering amongst the clusters; and
  
  merging the compact cluster representation to generate a merged cluster, the merging based upon a first overlap value applied to the at least one document with common item data, the merged cluster being based at least in part on the demand factor.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
PayPal, Inc. (PayPal Holdings, Inc.)
Original Assignee
eBay Inc.
Inventors
Sundaresan, Neelakantan, Ganesan, Kavita, Grandhi, Roopnath
Primary Examiner(s)
Lu, Kuen
Assistant Examiner(s)
Le, Jessica N

Application Number

US11/771,464
Publication Number

US 20080120292A1
Time in Patent Office

1,712 Days
Field of Search

707 3- 5, 707/708, 707/736, 707/737, 707/E17.089, 707/999.003
US Class Current

707/737
CPC Class Codes

G06F 16/35   Clustering; Classification

G06F 16/355   Class or cluster creation o...

G06F 16/951   Indexing; Web crawling tech...

Search clustering

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Search clustering

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links