System And Method For Clustering Nodes Of A Tree Structure

US 20080154926A1
Filed: 03/03/2008
Published: 06/26/2008
Est. Priority Date: 12/16/2002
Status: Active Grant

First Claim

Patent Images

1. A system for clustering nodes of a tree structure, comprising:

a storage module to maintain a plurality of messages, each message represented as a node in a tree structure;

a word vector module to assign a word vector to each message;

a node pair module to identify pairs of the nodes based on relationships in the tree structure and to combine the nodes of one or more of the pairs into clusters;

a cluster boundary module to adjust boundaries of each cluster, comprising at least one of;

a placement module to place a root node into one such cluster having a closest related child node;

a retention module to separate children nodes into distinct groups and to retain a relationship between a parent node and one such group comprising a nearest child node; and

a transfer module to transfer a parent node to one such cluster having all children of the parent node; and

a digest module to form a digest of the messages comprising one or more of the clusters.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for clustering nodes of a tree structure is provided. A plurality of messages is maintained. Each message is represented as a node in a tree structure. A word vector is assigned to each message. Pairs of the nodes are identified based on relationships in the tree structure. The nodes of one or more of the pairs are combined into clusters. Boundaries of each cluster are adjusted, including at least one of placing a root node into one such duster having a closest related child node, separating children nodes into distinct groups and retaining a relationship between a parent node and one such group including a nearest child node, and transferring a parent node to one such cluster having all children of the parent node. A digest of the messages, including one or more of the clusters is formed.

Citations

24 Claims

1. A system for clustering nodes of a tree structure, comprising:
- a storage module to maintain a plurality of messages, each message represented as a node in a tree structure;
  
  a word vector module to assign a word vector to each message;
  
  a node pair module to identify pairs of the nodes based on relationships in the tree structure and to combine the nodes of one or more of the pairs into clusters;
  
  a cluster boundary module to adjust boundaries of each cluster, comprising at least one of;
  
  a placement module to place a root node into one such cluster having a closest related child node;
  
  a retention module to separate children nodes into distinct groups and to retain a relationship between a parent node and one such group comprising a nearest child node; and
  
  a transfer module to transfer a parent node to one such cluster having all children of the parent node; and
  
  a digest module to form a digest of the messages comprising one or more of the clusters.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. A system according to claim 1, further comprising:
    - a parameter module to determine parameters comprising one or more of a halting cluster size, maximum cluster size, and halting distance; and
      
      a comparison module to apply the parameters to each pair of the nodes.
  - 3. A system according to claim 2, wherein the maximum cluster size is larger than the halting cluster size.
  - 4. A system according to claim 2, wherein the nodes of the one or more pairs are combined into the clusters based on the parameters.
  - 5. A system according to claim 1, further comprising:
    - an primary assignment module to assign a primary status to those clusters that are larger than a minimum size;
      
      a secondary assignment module to assign a secondary status to the clusters that are smaller than the minimum size; and
      
      a presentation module to present the primary and secondary clusters.
  - 6. A system according to claim 5, wherein the minimum size is determined by one of a predetermined number of the messages in one such primary cluster and a function of a number of the messages in the tree structure.
  - 7. A system according to claim 1, further comprising:
    - a distance module to determine a distance between the nodes of one or more pairs; and
      
      a cluster module to combine the nodes into the clusters based on the distance.

8. A method for clustering nodes of a tree structure, comprising:
- maintaining a plurality of messages, each message represented as a node in a tree structure;
  
  assigning a word vector to each message;
  
  identifying pairs of the nodes based on relationships in the tree structure and combining the nodes of one or more of the pairs into clusters;
  
  adjusting boundaries of each cluster, comprising at least one of;
  
  placing a root node into one such cluster having a closest related child node;
  
  separating children nodes into distinct groups and retaining a relationship between a parent node and one such group comprising a nearest child node; and
  
  transferring a parent node to one such cluster having all children of the parent node; and
  
  forming a digest of the messages comprising one or more of the clusters.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. A method according to claim 8, further comprising:
    - determining parameters comprising one or more of a halting cluster size, maximum cluster size, and halting distance; and
      
      applying the parameters to each pair of the nodes.
  - 10. A method according to claim 9, wherein the maximum cluster size is larger than the halting cluster size.
  - 11. A method according to claim 9, wherein the nodes of the one or more pairs are combined into the clusters based on the parameters.
  - 12. A method according to claim 8, further comprising:
    - assigning a primary status to those clusters that are larger than a minimum size;
      
      assigning a secondary status to the clusters that are smaller than the minimum size; and
      
      presenting the primary and secondary clusters.
  - 13. A method according to claim 12, wherein the minimum size is determined by one of a predetermined number of the messages in one such primary cluster and a function of a number of the messages in the tree structure.
  - 14. A method according to claim 8, further comprising:
    - determining a distance between the nodes of one or more pairs; and
      
      combining the nodes into the clusters based on the distance.

15. A system for normalizing quoting styles, comprising:
- a message module to receive a plurality of parent and child messages, each message comprising at least one of quoted and unquoted text;
  
  a quoting pair module to identify quoting pairs of the messages, each pair comprising one such parent message and one such child message;
  
  a text removal module to remove from the child message, the quoted text that covers all of the parent message;
  
  a text addition module to add at least a pan of the unquoted text of the parent message to the child message when the part of the unquoted text describes a common issue of the child message;
  
  a vector module to represent each message as a node in a tree structure and to determine a word vector for each node, comprising at least one of;
  
  a word vector processor to process the quoted and unquoted text in each message by performing word stemming, removing stop words, and expressing the word vectors as an inverse document frequency; and
  
  a word vector module to apply probabilistic latent semantic indexing to the quoted and unquoted text In each message to determine the word vector;
  
  a cluster pair module to identify cluster pairs of the nodes, each cluster pair based on at least one of a parent-child and sibling relationship determined by the tree structure;
  
  a distance module to assign a node distance to each cluster pair based on the word vector for each node, comprising at least one of;
  
  a distance determination module to determine a cosine distance for those of the word vectors containing one of raw and weighted counts; and
  
  a distance similarity module to determine a Bellinger similarity for those of the word vectors comprising the probabilistic latent semantic indexing; and
  
  a cluster module to group the cluster pairs that are closely related into clusters.
- View Dependent Claims (16, 17, 18, 19)
- - 16. A system according to claim 15, further comprising:
    - a string representation module to associate each cluster with a formatted string representing text; and
      
      a digest module to combine the formatted string for each cluster into a digest.
  - 17. A system according to claim 16, wherein the digest comprises one of an overview and summary of the text.
  - 18. A system according to claim 15, further comprising:
    - a parameter module to group the clusters based on parameters comprising one or more of a halting cluster size, maximum cluster size, and halting distance.
  - 19. A system according to claim 15, further comprising:
    - an issue module to determine a presence of the common issue based on a size of the parent message and the child message.

20. A method for normalizing quoting styles, comprising:
- receiving a plurality of parent and child messages, each message comprising at least one of quoted and unquoted text;
  
  identifying quoting pairs of the messages, each pair comprising one such parent message and one such child message;
  
  removing from the child message, the quoted text that covers all of the parent message;
  
  adding at least a part of the unquoted text of the parent message to the child message when the part of the unquoted text describes a common issue of the child message;
  
  representing each message as a node in a tree structure and determining a word vector for each node, comprising at least one of;
  
  processing the quoted and unquoted text in each message by performing word stemming, removing stop words, and expressing the word vectors as an inverse document frequency; and
  
  applying probabilistic latent semantic indexing to the quoted and unquoted text in each message to determine the word vector;
  
  identifying cluster pairs of the nodes, each cluster pair based on at least one of a parent-child and sibling relationship determined by tire tree structure;
  
  assigning a node distance to each cluster pair based on the word vector for each node, comprising at least one of;
  
  determining a cosine distance for those of the word vectors containing one of raw and weighted counts; and
  
  determining a Bellinger similarity for those of the word vectors comprising the probabilistic latent semantic indexing; and
  
  grouping the cluster pairs that are closely related into clusters.
- View Dependent Claims (21, 22, 23, 24)
- - 21. A method according to claim 20, further comprising:
    - associating each cluster with a formatted string representing text; and
      
      combining the formatted string for each cluster into a digest.
  - 22. A method according to claim 21, wherein the digest comprises one of an overview and summary of the text.
  - 23. A method according to claim 20, further comprising:
    - grouping the based on parameters comprising one or more of a halting cluster size, maximum cluster size, and halting distance.
  - 24. A method according to claim 20, further comprising:
    - determining a presence of the common issue based on a size of the parent message and the child message.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Paula S. Newman
Original Assignee
Paula S. Newman
Inventors
Newman, Paula S.

Granted Patent

US 8,156,430 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/100
CPC Class Codes

G06Q 10/107 Computer-aided management o...

H04L 69/22 Parsing or analysis of headers

System And Method For Clustering Nodes Of A Tree Structure

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

System And Method For Clustering Nodes Of A Tree Structure

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links