Method and apparatus for generating overview information for hierarchically related information
First Claim
Patent Images
1. A method of forming an overview for hierarchically related information, comprising:
- forming a vector for each document of hierarchically related information that comprises one position for each unique word contained in the document;
representing each document as a node in a cluster that initially comprises the node for only that document;
successively combining pairs of the clusters by matching the vectors for the documents of the nodes in each cluster pair that are most similar, until an application-specific criteria is met;
identifying lexically central nodes in each of the clusters, comprising;
determining a lexical centroid of the vectors for the documents of the nodes in the cluster;
selecting those nodes for the documents having vectors in the lexical centroid as the lexically central nodes; and
determining a remainder set comprising the nodes in the cluster that are not one of the lexically central nodes,identifying auxiliary nodes comprising;
selecting the nodes from the remainder set that are for a document that is a parent of a document for one of the lexically central nodes for the cluster; and
selecting the nodes from the remainder set that are for a document that is a parent of a specified plurality of documents for the nodes in the cluster,combining the lexically central nodes and the auxiliary nodes to form an extraction node set,identifying quoting interactions in the document for each node in the extraction node set and selecting an overview string from the document based on the quoting interactions, andcombining the overview strings to form an overview for the hierarchically related information.
1 Assignment
0 Petitions
Accused Products
Abstract
A method is provided for digesting the content of hierarchically related information. The method, which obtains relatively short overviews, selects a proportion of representative nodes and then extracts and organizes one or more sentences from the text associated with each selected node. For text trees representing archived discussions, the selection of nodes and sentences is from comment/response sequences drawn from lexically central nodes which will capture those aspects of the discussion considered most important to discussion participants.
49 Citations
14 Claims
-
1. A method of forming an overview for hierarchically related information, comprising:
-
forming a vector for each document of hierarchically related information that comprises one position for each unique word contained in the document; representing each document as a node in a cluster that initially comprises the node for only that document; successively combining pairs of the clusters by matching the vectors for the documents of the nodes in each cluster pair that are most similar, until an application-specific criteria is met; identifying lexically central nodes in each of the clusters, comprising; determining a lexical centroid of the vectors for the documents of the nodes in the cluster; selecting those nodes for the documents having vectors in the lexical centroid as the lexically central nodes; and determining a remainder set comprising the nodes in the cluster that are not one of the lexically central nodes, identifying auxiliary nodes comprising; selecting the nodes from the remainder set that are for a document that is a parent of a document for one of the lexically central nodes for the cluster; and selecting the nodes from the remainder set that are for a document that is a parent of a specified plurality of documents for the nodes in the cluster, combining the lexically central nodes and the auxiliary nodes to form an extraction node set, identifying quoting interactions in the document for each node in the extraction node set and selecting an overview string from the document based on the quoting interactions, and combining the overview strings to form an overview for the hierarchically related information. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14)
-
-
9. The method of 1, further comprising:
-
determining whether the nodes of a cluster include conversation root node, which is the node for a root document for a plurality of hierarchically related documents of a tree representing a stored conversation, and selecting the root node and a proportion of its children nodes in the cluster as the lexically central nodes.
-
Specification