Clustering data objects

US 8,055,592 B2
Filed: 07/26/2007
Issued: 11/08/2011
Est. Priority Date: 08/01/2006
Status: Expired due to Fees

First Claim

Patent Images

1. A method for unsupervised clustering data objects, comprising:

calculating, with a processor, based on a relative depth in a semantic hierarchical tree of a dictionary, an importance value of at least one member in a first data object represented as a variable length vector of 0 to N members, said vector further comprising a subset of said members having an importance value above a designated importance threshold, wherein the data objects comprise sentences and said members comprise words, therein;

calculating, with said processor, based on a path distance in said semantic hierarchical tree of a dictionary, a member similarity value for each member of said subset of said members to at least a second data object;

when none of said subset of said members of said first data object are associated with at least one of a subset of members of said at least a second data object, in dependence upon a comparison of similarity values, dynamically form, with a clustering module, a first cluster comprising said first data object; and

when at least one of said subset of said members of said first data object is associated with at least one of a subset of members of said at least a second data object, in dependence upon a comparison of similarity values, dynamically form, with said clustering module, at least a second cluster comprising said first data object and said at least a second data object.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system for clustering data objects includes a module for calculating an importance value of at least one member in a first data object represented as a variable length vector of 0 to N members and a clustering module for dynamically forming a plurality of clusters containing one or more data objects. The clustering module is configured to associate the first data object with at least one of the plurality of clusters in dependence upon the at least one member'"'"'s similarity value in comparison to members in other data objects. The clustering module may be configured to cluster the first data object into a plurality of clusters if it has at least two members and each member belongs to a different cluster.

23 Citations

View as Search Results

7 Claims

1. A method for unsupervised clustering data objects, comprising:
- calculating, with a processor, based on a relative depth in a semantic hierarchical tree of a dictionary, an importance value of at least one member in a first data object represented as a variable length vector of 0 to N members, said vector further comprising a subset of said members having an importance value above a designated importance threshold, wherein the data objects comprise sentences and said members comprise words, therein;
  
  calculating, with said processor, based on a path distance in said semantic hierarchical tree of a dictionary, a member similarity value for each member of said subset of said members to at least a second data object;
  
  when none of said subset of said members of said first data object are associated with at least one of a subset of members of said at least a second data object, in dependence upon a comparison of similarity values, dynamically form, with a clustering module, a first cluster comprising said first data object; and
  
  when at least one of said subset of said members of said first data object is associated with at least one of a subset of members of said at least a second data object, in dependence upon a comparison of similarity values, dynamically form, with said clustering module, at least a second cluster comprising said first data object and said at least a second data object.
- View Dependent Claims (2, 7)
- - 2. The method of claim 1, wherein said sentences comprise textual messages, and the method further comprises parsing said first data object utilizing a natural language parsing engine.
  - 7. The method of claim 1, further comprising linking each cluster to a group of data objects associated with the cluster.

3. A computer program product for unsupervised clustering of data objects, the computer program product comprising:
- a computer usable medium having computer usable program code embodied therewith, the computer usable program code comprising;
  
  computer usable program code configured to calculate, based on a relative depth in a semantic hierarchical tree of a dictionary, an importance value of at least one member in a first data object represented as a variable length vector of 0 to N members, said vector further comprising a subset of said members having an importance value above a designated importance threshold, wherein the data objects comprise sentences and said members comprise words, therein;
  
  computer usable program code configured to calculate, based on a path distance in said semantic hierarchical tree of a dictionary, a member similarity value for each member of said subset of said members to at least a second data object;
  
  when none of said subset of said members of said first data object are associated with at least one of a subset of members of said at least a second data object, in dependence upon a comparison of similarity values, computer usable program code configured to dynamically form, with a clustering module, a first cluster comprising said first data object; and
  
  when at least one of said subset of said members of said first data object is associated with at least one of a subset of members of said at least a second data object, in dependence upon a comparison of similarity values, computer usable program code configured to dynamically form, with said clustering module, at least a second cluster comprising said first data object and said at least a second data object.
- View Dependent Claims (4)
- - 4. The computer program product of claim 3 wherein said sentences comprise textual messages, and the computer program product further comprises computer usable program code configured to parse said first data object utilizing a natural language parsing engine.

5. A method for unsupervised clustering of data objects, comprising:
- calculating, with a processor, based on a relative depth in a semantic hierarchical tree of a dictionary, an importance value of at least one member in a first data object represented as a variable length vector of 0 to N members, said vector further comprising a subset of said members having an importance value above a designated importance threshold, wherein the data objects comprise sentences of an electronic messaging system and said members comprise words, therein;
  
  calculating, with said processor, based on a path distance in said semantic hierarchical tree of a dictionary, a member similarity value for each member of said subset of said members to at least a second data object;
  
  when none of said subset of said members of said first data object are associated with at least one of a subset of members of said at least a second data object, in dependence upon a comparison of similarity values, dynamically form, with a clustering module, a first cluster comprising said first data object; and
  
  when at least one of said subset of said members of said first data object is associated with at least one of a subset of members of said at least a second data object, in dependence upon a comparison of similarity values, dynamically form, with said clustering module, at least a second cluster comprising said first data object and said at least a second data object.
- View Dependent Claims (6)
- - 6. The computer system of claim 5, wherein said sentences of an electronic messaging system comprise sentences of an electronic chat system, and the processor is further programmed to parse the data objects with a natural language parsing engine.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Boyle, Peter Currie, Zhang, Yu
Primary Examiner(s)
Gaffin; Jeffrey A
Assistant Examiner(s)
KENNEDY, ADRIAN L

Application Number

US11/828,416
Publication Number

US 20080077572A1
Time in Patent Office

1,566 Days
Field of Search

None
US Class Current

706/12
CPC Class Codes

G06F 40/205   Parsing

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/30   Semantic analysis

Clustering data objects

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

23 Citations

7 Claims

Specification

Solutions

Use Cases

Quick Links

Clustering data objects

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

23 Citations

7 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links