Multi-pass data organization and automatic naming

US 7,739,247 B2
Filed: 12/28/2006
Issued: 06/15/2010
Est. Priority Date: 12/28/2006
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

a server at least in selective communication with a client machine, the server configured to;

receive a query from the client machine;

retrieve a data set based on the query, andorganize the data set into subsets with at least a first pass and a second pass, wherein the first pass is statistic driven and the second pass is attribute driven, wherein the statistic driven first pass is selected from a set consisting essentially of organizational clustering and hierarchical clustering, and wherein the second pass is to partition a subset of the data set that results from the first pass, andname each of the subsets, based at least in part on a property shared by at least a majority of the data units of the subset.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and a system to organize a data set into groups of data subsets in multiple passes using different parameters and to automatically name the groups is disclosed. For example, a data set is retrieved in accordance with a search query submitted by a user. The data set is organized into clusters based on a statistic(s) of the data set. The data set is then organized into groups of data subsets based on an attribute(s) indicated by the data set. Each of the groups are automatically named based on a property shared by data units of the group. The name(s) of a group may be mined from the data units of the group, retrieved from a structure that maps to attribute values indicated by the data units of the group, etc.

81 Citations

View as Search Results

22 Claims

1. A system comprising:
- a server at least in selective communication with a client machine, the server configured to;
  
  receive a query from the client machine;
  
  retrieve a data set based on the query, andorganize the data set into subsets with at least a first pass and a second pass, wherein the first pass is statistic driven and the second pass is attribute driven, wherein the statistic driven first pass is selected from a set consisting essentially of organizational clustering and hierarchical clustering, and wherein the second pass is to partition a subset of the data set that results from the first pass, andname each of the subsets, based at least in part on a property shared by at least a majority of the data units of the subset.
- View Dependent Claims (2, 3)
- - 2. The system of claim 1, wherein the server is further configured to access a name structure to name each of the data units.
  - 3. The system of claim 1, wherein the server is further configured to use text from each of the subsets to name respective ones of the subsets.

4. A method comprising the acts of:
- receiving, at a server, a query from a client machine;
  
  retrieving a data set based on the query, the data set containing a plurality of data units;
  
  organizing the plurality of data units into clusters, based at least in part on one or more statistics of the plurality of data units;
  
  organizing the organized plurality of data units into at least a first group and a second group based on at least one attribute indicated by the plurality of data units, wherein the data units of the first group share a first similarity with respect to the at least one attribute and the data units of the second group share a second similarity with respect to the at least one attribute; and
  
  automatically naming the first group based, at least in part, on a first property shared by at least a majority of the data units of the first group and automatically naming the second group based, at least in part, on a second property shared by at least a majority of the data units of the second group.
- View Dependent Claims (5, 6, 7, 8, 9, 10, 11)
- - 5. The method of claim 4, wherein said act of organizing the organized plurality of data units partitions a first of the clusters into the second group and a third group.
  - 6. The method of claim 4, wherein a data unit comprises at least one of a set consisting essentially of text, metric, image, and price.
  - 7. The method of claim 4, wherein said act of automatically naming the first group comprises accessing a structure to retrieve a name that maps directly or indirectly to the first property.
  - 8. The method of claim 4, wherein said act of automatically naming the first group comprises identifying the first property and using the identified first property as a name for the first group.
  - 9. The method of claim 4, wherein said act of organizing the plurality of data units into clusters employs a technique selected from a set consisting essentially of agglomerative clustering, k-means clustering, and fuzzy clustering.
  - 10. The method of claim 4, wherein the set of one or more statistics are at least one of a set consisting essentially of user behavior statistics for the plurality of data units and statistics that correspond to the first and the second properties.
  - 11. The method of claim 4, further comprising supplying a name of the first group.

12. A method of organizing data, the method comprising the acts of:
- receiving, at a server, a query from a client machine;
  
  retrieving a data set based on the query;
  
  organizing the data set into groups of data subsets with at least a first pass and a second pass over the data set, wherein the first pass employs statistic driven clustering and the second pass employs attribute driven clustering;
  
  wherein the statistic driven first pass is selected from a set consisting essentially of organizational clustering and hierarchical clustering, and wherein the second pass is to partition a subset of the data set that results from the first pass; and
  
  automatically naming each of the groups of data subsets based, at least in part, on similarity of the data subsets in each group.
- View Dependent Claims (13)
- - 13. The method of claim 12, further comprising examining the data set and extrapolating one or more statistics for the statistic driven clustering.

14. A set of instructions encoded on one or more machine-readable storage media, the set of instructions comprising:
- a first sequence of instructions executable to employ statistical clustering to organize a plurality of data units; and
  
  a second sequence of instructions executable to employ structural clustering to organize the plurality of data units organized by the first sequence of instructions into groups with respect to an attribute indicated by the data set; and
  
  a third sequence of instructions executable to indicate one or more names for each of the groups, and to access a structure for the one or more names; and
  
  a fourth sequence of instructions executable to generate the one or more names for each group based, at least in part, on at least one shared across at least a majority of data units within each group.
- View Dependent Claims (15, 16)
- - 15. The set of instructions of claim 14, further comprising a fourth sequence of instructions executable to supply a name of a first of the groups named by the third sequence of instructions.
  - 16. The set of instructions of claim 14, further comprising a fourth sequence of instructions executable to examine the plurality of data units to generate at least one statistic for driving the statistical clustering employed by the first sequence of instructions.

17. An apparatus comprising:
- a memory operable to host a set of data;
  
  means for grouping the set of data into plural groups based on one or more statistics for the set of data and based on similarities among the set of data with respect to an attribute indicated by the set of data;
  
  wherein the grouping includes a least a first past and a second pass, wherein the first pass is selected from a set consisting essentially of organizational clustering and hierarchical clustering, and wherein the second pass is to partition a subset of the data set that results from the first pass; and
  
  means for automatically naming the plural groups.
- View Dependent Claims (18, 19)
- - 18. The apparatus of claim 17, further comprising means for extrapolating at least some of the one or more statistics from the set of data.
  - 19. The apparatus of claim 17, further comprising means for presenting the groups with names.

20. An apparatus comprising:
- a memory operable to host a plurality of data units;
  
  a navigation module operable to retrieve a plurality of data units responsive to a query;
  
  an organizing module coupled with the navigation module, the organizing module operable to organize the plurality of retrieved data units in accordance with a set of one or more statistics for the plurality of data units and to then organize the plurality of data units into groups in accordance with at least one attribute indicated by the plurality of data units; and
  
  a naming module coupled with the organizing module, the naming module operable to name each of the groups, based at least in part on a property shared by at least a majority of the data units within the group.
- View Dependent Claims (21, 22)
- - 21. The apparatus of claim 20, wherein the navigation module is further operable to present the retrieved plurality of data units in accordance with group organization by the organizing module.
  - 22. The apparatus of claim 20, further comprising a statistic module coupled with the organizing module, the statistic module operable to examine the plurality of data units and to extrapolate a set of one or more statistics.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
eBay Inc.
Original Assignee
eBay Inc.
Inventors
Sarwar, Badrul M., Mount, John A.
Primary Examiner(s)
Fleurantin; Jean B.
Assistant Examiner(s)
Ly; Anh

Application Number

US11/646,905
Publication Number

US 20080162533A1
Time in Patent Office

1,265 Days
Field of Search

707/1, 707/10, 707100-102, 707/104.1, 707/688, 707/736, 707/637
US Class Current

707/688
CPC Class Codes

G06F 16/248   Presentation of query results

G06F 16/285   Clustering or classification

G06Q 10/06   Resources, workflows, human...

Multi-pass data organization and automatic naming

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

81 Citations

22 Claims

Specification

Use Cases

Quick Links

Others

Multi-pass data organization and automatic naming

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

81 Citations

22 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others