System And Method For Clustering Nodes Of A Tree Structure
First Claim
1. A system for clustering nodes of a tree structure, comprising:
- a storage module to maintain a plurality of messages, each message represented as a node in a tree structure;
a word vector module to assign a word vector to each message;
a node pair module to identify pairs of the nodes based on relationships in the tree structure and to combine the nodes of one or more of the pairs into clusters;
a cluster boundary module to adjust boundaries of each cluster, comprising at least one of;
a placement module to place a root node into one such cluster having a closest related child node;
a retention module to separate children nodes into distinct groups and to retain a relationship between a parent node and one such group comprising a nearest child node; and
a transfer module to transfer a parent node to one such cluster having all children of the parent node; and
a digest module to form a digest of the messages comprising one or more of the clusters.
0 Assignments
0 Petitions
Accused Products
Abstract
A system and method for clustering nodes of a tree structure is provided. A plurality of messages is maintained. Each message is represented as a node in a tree structure. A word vector is assigned to each message. Pairs of the nodes are identified based on relationships in the tree structure. The nodes of one or more of the pairs are combined into clusters. Boundaries of each cluster are adjusted, including at least one of placing a root node into one such duster having a closest related child node, separating children nodes into distinct groups and retaining a relationship between a parent node and one such group including a nearest child node, and transferring a parent node to one such cluster having all children of the parent node. A digest of the messages, including one or more of the clusters is formed.
-
Citations
24 Claims
-
1. A system for clustering nodes of a tree structure, comprising:
-
a storage module to maintain a plurality of messages, each message represented as a node in a tree structure; a word vector module to assign a word vector to each message; a node pair module to identify pairs of the nodes based on relationships in the tree structure and to combine the nodes of one or more of the pairs into clusters; a cluster boundary module to adjust boundaries of each cluster, comprising at least one of; a placement module to place a root node into one such cluster having a closest related child node; a retention module to separate children nodes into distinct groups and to retain a relationship between a parent node and one such group comprising a nearest child node; and a transfer module to transfer a parent node to one such cluster having all children of the parent node; and a digest module to form a digest of the messages comprising one or more of the clusters. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for clustering nodes of a tree structure, comprising:
-
maintaining a plurality of messages, each message represented as a node in a tree structure; assigning a word vector to each message; identifying pairs of the nodes based on relationships in the tree structure and combining the nodes of one or more of the pairs into clusters; adjusting boundaries of each cluster, comprising at least one of; placing a root node into one such cluster having a closest related child node; separating children nodes into distinct groups and retaining a relationship between a parent node and one such group comprising a nearest child node; and transferring a parent node to one such cluster having all children of the parent node; and forming a digest of the messages comprising one or more of the clusters. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system for normalizing quoting styles, comprising:
-
a message module to receive a plurality of parent and child messages, each message comprising at least one of quoted and unquoted text; a quoting pair module to identify quoting pairs of the messages, each pair comprising one such parent message and one such child message; a text removal module to remove from the child message, the quoted text that covers all of the parent message; a text addition module to add at least a pan of the unquoted text of the parent message to the child message when the part of the unquoted text describes a common issue of the child message; a vector module to represent each message as a node in a tree structure and to determine a word vector for each node, comprising at least one of; a word vector processor to process the quoted and unquoted text in each message by performing word stemming, removing stop words, and expressing the word vectors as an inverse document frequency; and a word vector module to apply probabilistic latent semantic indexing to the quoted and unquoted text In each message to determine the word vector; a cluster pair module to identify cluster pairs of the nodes, each cluster pair based on at least one of a parent-child and sibling relationship determined by the tree structure; a distance module to assign a node distance to each cluster pair based on the word vector for each node, comprising at least one of; a distance determination module to determine a cosine distance for those of the word vectors containing one of raw and weighted counts; and a distance similarity module to determine a Bellinger similarity for those of the word vectors comprising the probabilistic latent semantic indexing; and a cluster module to group the cluster pairs that are closely related into clusters. - View Dependent Claims (16, 17, 18, 19)
-
-
20. A method for normalizing quoting styles, comprising:
-
receiving a plurality of parent and child messages, each message comprising at least one of quoted and unquoted text; identifying quoting pairs of the messages, each pair comprising one such parent message and one such child message; removing from the child message, the quoted text that covers all of the parent message; adding at least a part of the unquoted text of the parent message to the child message when the part of the unquoted text describes a common issue of the child message; representing each message as a node in a tree structure and determining a word vector for each node, comprising at least one of; processing the quoted and unquoted text in each message by performing word stemming, removing stop words, and expressing the word vectors as an inverse document frequency; and applying probabilistic latent semantic indexing to the quoted and unquoted text in each message to determine the word vector; identifying cluster pairs of the nodes, each cluster pair based on at least one of a parent-child and sibling relationship determined by tire tree structure; assigning a node distance to each cluster pair based on the word vector for each node, comprising at least one of; determining a cosine distance for those of the word vectors containing one of raw and weighted counts; and determining a Bellinger similarity for those of the word vectors comprising the probabilistic latent semantic indexing; and grouping the cluster pairs that are closely related into clusters. - View Dependent Claims (21, 22, 23, 24)
-
Specification