Multi-pass data organization and automatic naming

US 8,145,638 B2
Filed: 03/14/2011
Issued: 03/27/2012
Est. Priority Date: 12/28/2006
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

identifying a first cluster of data items among a plurality of data items in response to a query,each of the plurality of data items including an attribute able to have one of a plurality of values of the attribute;

subdividing the first cluster of data items into a second cluster of data items and a third cluster of data items,the subdividing of the first cluster being performed by a processor of a machine and based on a common value of the attribute,the common value being present in each data item within the second cluster and absent from each data item within the third cluster;

storing the second cluster of data items as corresponding to the common value of the attribute,each data item within the second cluster representing one of a plurality of items; and

naming the second cluster based on a property shared by a majority of the second cluster.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and a system to organize a data set into groups of data subsets in multiple passes using different parameters and to automatically name the groups is disclosed. For example, a data set is retrieved in accordance with a search query submitted by a user. The data set is organized into clusters based on a statistic(s) of the data set. The data set is then organized into groups of data subsets based on an attribute(s) indicated by the data set. Each of the groups are automatically named based on a property shared by data units of the group. The name(s) of a group may be mined from the data units of the group, retrieved from a structure that maps to attribute values indicated by the data units of the group, etc.

Citations

19 Claims

1. A method comprising:
- identifying a first cluster of data items among a plurality of data items in response to a query,each of the plurality of data items including an attribute able to have one of a plurality of values of the attribute;
  
  subdividing the first cluster of data items into a second cluster of data items and a third cluster of data items,the subdividing of the first cluster being performed by a processor of a machine and based on a common value of the attribute,the common value being present in each data item within the second cluster and absent from each data item within the third cluster;
  
  storing the second cluster of data items as corresponding to the common value of the attribute,each data item within the second cluster representing one of a plurality of items; and
  
  naming the second cluster based on a property shared by a majority of the second cluster.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein:
    - the identifying of the first cluster of data items includes;
      
      performing a statistical analysis of the plurality of data items; and
      
      clustering at least some of the plurality of data items based on the statistical analysis to form the first cluster of data items.
  - 3. The method of claim 2, wherein:
    - the performing of the statistical analysis includes processing information selected from a group consisting of;
      
      the attribute included in each of the plurality of data items,a number of attributes included in at least some of the plurality of data items,a description of an item represented by one of the plurality of data items,an amount of text in the description of the item,a taxonomy of the item,a probability that the item co-occurs with a further item represented by another one of the plurality of data items, anduser behavior data.
  - 4. The method of claim 2, wherein:
    - the performing of the statistical analysis is based on a technique selected from a group consisting of;
      
      k-means clustering,fuzzy clustering, andagglomerative clustering.
  - 5. The method of claim 2, wherein:
    - the performing of the statistical analysis includes identifying a further cluster of data items among the plurality of data items,the common value of the attribute being present in each data item within the further cluster.
  - 6. The method of claim 2, wherein:
    - the performing of the statistical analysis includes identifying a further cluster of data items among the plurality of data items,the common value of the attribute being absent from each data item within the further cluster.
  - 7. The method of claim 1, wherein:
    - each data item within the second cluster represents one of a plurality of items; and
      
      the method further comprisesdetermining a property shared by a majority of data items within the second cluster of data items.
  - 8. The method of claim 7, wherein:
    - the determining of the property is based on the common value of the attribute present in each data item within the second cluster.
  - 9. The method of claim 7, wherein:
    - the determining of the property is based on a further attribute present in a majority of data items within the second cluster.
  - 10. The method of claim 9, wherein:
    - the determining of the property includes identifying a further value of the further attribute included in a majority of data items within the second cluster.
  - 11. The method of claim 10, wherein:
    - the identifying of the further value is based on a data structure that includes at least one of a synonym of the further value or a term similar to the further value.
  - 12. The method of claim 1, wherein:
    - the naming of the second cluster is based on the common value of the attribute present in each data item within the second cluster.

13. A system comprising:
- a navigation module configured to access a plurality of data items,each of the plurality of data items including an attribute able to have one of a plurality of values of the attribute; and
  
  a processor configured by an organizing module communicatively coupled to the navigation module, the organizing module configured to;
  
  identify a first cluster of data items among the plurality of data items in response to a query;
  
  subdivide the first cluster of data items into a second cluster of data items and a third cluster of data items,the subdividing of the first cluster being based on a common value of the attribute,the common value being present in each data item within the second cluster and absent from each data item within the third cluster;
  
  store the second cluster of data items as corresponding to the common value of the attribute,each data item within the second cluster representing one of a plurality of items; and
  
  name the second cluster based on a property shared by a majority of the second cluster.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The system of claim 13, wherein the organizing module is further configured to:
    - identify the first cluster of data items by performing a statistical analysis of the plurality of data items; and
      
      cluster at least some of the plurality of data items based on the statistical analysis to form the first cluster of data items.
  - 15. The system of claim 13, wherein:
    - each data item within the second cluster represents one of a plurality of items; and
      
      the organizing module is further configured todetermine a property shared by a majority of data items within the second cluster of data items.
  - 16. The system of claim 15, wherein the organizing module is further configured to:
    - determine the property based on the common value of the attribute present in each data item within the second cluster.
  - 17. The system of claim 15, wherein the organizing module is further configured to:
    - determine the property based on a further attribute present in a majority of data items within the second cluster.

18. A non-transitory machine-readable storage medium comprising instructions that, when executed by one or more processors of a machine, cause the machine to perform operations comprising:
- identifying a first cluster of data items among a plurality of data items in response to a query,each of the plurality of data items including an attribute able to have one of a plurality of values of the attribute;
  
  subdividing the first cluster of data items into a second cluster of data items and a third cluster of data items,the subdividing of the first cluster being based on a common value of the attribute,the common value being present in each data item within the second cluster and absent from each data item within the third cluster;
  
  storing the second cluster of data items as corresponding to the common value of the attribute,each data item within the second cluster representing one of a plurality of items; and
  
  naming the second cluster based on a property shared by a majority of the second cluster.

19. A system comprising:
- means for accessing a plurality of data items,each of the plurality of data items including an attribute able to have one of a plurality of values of the attribute; and
  
  means for;
  
  identifying a first cluster of data items among the plurality of data items in response to a query;
  
  subdividing the first cluster of data items into a second cluster of data items and a third cluster of data items,the subdividing of the first cluster being based on a common value of the attribute,the common value being present in each data item within the second cluster and absent from each data item within the third cluster;
  
  storing the second cluster of data items as corresponding to the common value of the attribute,each data item within the second cluster representing one of a plurality of items; and
  
  naming the second cluster based on a property shared by a majority of the second cluster.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
eBay Inc.
Original Assignee
eBay Inc.
Inventors
Sarwar, Badrul M., Mount, John A.
Primary Examiner(s)
Corrielus, Jean M
Assistant Examiner(s)
Ly, Anh

Application Number

US13/047,646
Publication Number

US 20110179033A1
Time in Patent Office

379 Days
Field of Search

707/737, 707/736, 707/769, 707/770, 707/688, 709/220, 709/217, 709/223, 709/227, 709/270
US Class Current

707/737
CPC Class Codes

G06F 16/248   Presentation of query results

G06F 16/285   Clustering or classification

G06Q 10/06   Resources, workflows, human...

Multi-pass data organization and automatic naming

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Multi-pass data organization and automatic naming

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links