Method and apparatus for generating summary information for hierarchically related information
First Claim
1. A method of forming a summary for hierarchically related information, where the information can be represented as a set of nodes wherein each node is associated with a portion of the information and that portion of the information contains at least one sentence, and wherein the nodes are connected by directed edges such that each node has at most one incoming edge, a parent node is the source of an incoming edge, and a child node is the target of an outgoing edge, the method comprising:
- a) determining a sentence vector for each sentence associated with each node,b) determining a centroid vector of the sentence vectors,c) determining an intrinsic score for each sentence using the centroid vector,d) selecting the sentence with the highest intrinsic score as a summary sentence to form a summary,e) determining an extract score for each of the remaining sentences using the intrinsic score for the remaining sentence and at least one summary sentence included in the summary,f) selecting the sentence having the highest extract score as a summary sentence and adding the summary sentence to the summary, andg) repeating steps e) and f) until a desired number of sentences from the remaining sentences are selected as summary sentences for inclusion in the summary.
1 Assignment
0 Petitions
Accused Products
Abstract
A method is provided for digesting the content of hierarchically related information. The method chooses a set of extracted sentences representing a proportion of the text associated with a subtopic, by a combination of features resting on inherent properties of the sentences, and on the content of a developing summary.
31 Citations
11 Claims
-
1. A method of forming a summary for hierarchically related information, where the information can be represented as a set of nodes wherein each node is associated with a portion of the information and that portion of the information contains at least one sentence, and wherein the nodes are connected by directed edges such that each node has at most one incoming edge, a parent node is the source of an incoming edge, and a child node is the target of an outgoing edge, the method comprising:
-
a) determining a sentence vector for each sentence associated with each node, b) determining a centroid vector of the sentence vectors, c) determining an intrinsic score for each sentence using the centroid vector, d) selecting the sentence with the highest intrinsic score as a summary sentence to form a summary, e) determining an extract score for each of the remaining sentences using the intrinsic score for the remaining sentence and at least one summary sentence included in the summary, f) selecting the sentence having the highest extract score as a summary sentence and adding the summary sentence to the summary, and g) repeating steps e) and f) until a desired number of sentences from the remaining sentences are selected as summary sentences for inclusion in the summary. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method of forming a summary for hierarchically related information represented as a set of nodes wherein each node is associated with a portion of the information containing at least one sentence, and wherein the nodes are connected by directed edges such that each node has at most one incoming edge, a parent node is the source of an incoming edge, and a child node is the target of an outgoing edge, the method comprising:
-
selecting a sentence from all of the sentences associated with all of the nodes as a first summary sentence to form a summary;
the first summary sentence being selected by computing an intrinsic score indicating a lexical centrality of the sentence to the information represented by the set of nodes;
the first summary sentence having the highest intrinsic score of all of the sentences associated with all of the nodes; andfor each additional summary sentence to be added to the summary until a desired summary length is reached, selecting a remaining sentence from the plurality of remaining sentences associated with all of the nodes;
the selected remaining sentence having the highest extract score of all extract scores computed for the remaining sentences;
computing the extract score includingdetermining the number of non-quoting sentences in the summary which are adjacent to the selected remaining sentence; determining the number of sentences in the summary whose associated nodes are either a parent node or a child node of the node associated with the selected remaining sentence; determining the number of sentences in the summary which are quoted immediately before the selected remaining sentence; and determining the number of sentences in the summary that appear immediately after a quote of the selected remaining sentence. - View Dependent Claims (9, 10, 11)
-
Specification