Systems and methods for hybrid text summarization
First Claim
1. A method of determining a hybrid text summary by a hybrid summarization system having a processor, a relevance score determination module, a structural representation of the discourse determination module and a percolation module, the method comprising:
- determining discourse constituents for a text by the processor;
determining a structural representation of discourse for the text by the structural representation of the discourse determination module;
determining relevance scores for discourse constituents based on at least one non-structural measure of relevance by the relevance score determination module;
percolating relevance scores based on the structural representation of discourse by the percolation module; and
determining a hybrid text summary, by the processor, based on discourse constituents with relevance scores compared to a threshold relevance score,wherein percolating the relevance scores comprises;
for each child discourse constituent node in the structural representation, assigning the relevance score of the child discourse constituent node to the parent discourse constituent node if the child discourse constituent node is more relevant;
for any subordinating nodes, assigning the relevance scores of the subordinated discourse constituent to the subordinating discourse constituent if the subordinated discourse constituent is more relevant; and
for any coordination nodes, assigning the relevance score of the most relevant child to other child discourse constituent nodes.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques are provided for segmenting text into categorized discourse constituents and attaching discourse constituents into a structural representation of discourse. Techniques for determining hybrid structural and non-structural summaries of a text are also provided. A text is segmented based on a theory of discourse analysis into at least a main discourse constituent containing spatio-temporal information about a single event in a possible world view. The discourse constituents are then inserted into a structural representation of discourse. Non-structural techniques are used to determine relevance scores and important discourse constituents are determined. Relevance scores are percolated through the structural representation of discourse to determine supporting preceding discourse constituents that preserve grammaticality. A hybrid text summary is then determined based on the structural representation of the discourse and relevance scores.
69 Citations
21 Claims
-
1. A method of determining a hybrid text summary by a hybrid summarization system having a processor, a relevance score determination module, a structural representation of the discourse determination module and a percolation module, the method comprising:
-
determining discourse constituents for a text by the processor; determining a structural representation of discourse for the text by the structural representation of the discourse determination module; determining relevance scores for discourse constituents based on at least one non-structural measure of relevance by the relevance score determination module; percolating relevance scores based on the structural representation of discourse by the percolation module; and determining a hybrid text summary, by the processor, based on discourse constituents with relevance scores compared to a threshold relevance score, wherein percolating the relevance scores comprises; for each child discourse constituent node in the structural representation, assigning the relevance score of the child discourse constituent node to the parent discourse constituent node if the child discourse constituent node is more relevant; for any subordinating nodes, assigning the relevance scores of the subordinated discourse constituent to the subordinating discourse constituent if the subordinated discourse constituent is more relevant; and for any coordination nodes, assigning the relevance score of the most relevant child to other child discourse constituent nodes. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method of determining a hybrid text summary by a hybrid summarization system having a processor, a relevance score determination module, a structural representation of the discourse determination module and a percolation module, the method comprising:
-
determining discourse constituents for a text by the processor; determining a structural representation of discourse for the text by the structural representation of the discourse determination module; determining relevance scores for discourse constituents based on at least one non-structural measure of relevance by the relevance score determination module; percolating relevance scores based on the structural representation of discourse by the percolation module; and determining a hybrid text summary, by the processor, based on discourse constituents with relevance scores compared to a threshold relevance score, wherein percolating the relevance scores comprises; for each child discourse constituent node in the structural representation, assigning the relevance score of the child discourse constituent node to the parent discourse constituent node if the child discourse constituent node is more relevant than its parent; for each coordinated discourse constituent node, assigning the relevance score of the coordinated discourse constituent node to each preceding less relevant sibling node;
for each child discourse constituent node that is not a coordinated discourse constituent node and is not a subordinated discourse constituent node, assigning the relevance score of the parent discourse constituent node to the child discourse constituent node if the parent discourse constituent is more relevant than the child;for each coordinated discourse constituent node, assigning the relevance score of the parent discourse constituent node to the coordinated discourse constituent node, if the coordinated discourse node and all its siblings are less relevant than the parent node; for each subordinated discourse constituent node, assigning the relevance score of the subordinated discourse constituent node to the subordinating discourse constituent if the subordinated discourse constituent is more relevant than the subordinating node; and for each node, repeating these steps, until no node can be found whose relevance score is changed to the relevance score of another node. - View Dependent Claims (7)
-
-
8. A method of determining a hybrid text summary by a hybrid summarization system having a processor, a relevance score determination module, a structural representation of the discourse determination module and a percolation module, the method comprising:
-
determining discourse constituents for a text by the processor; determining a structural representation of discourse for the text by the structural representation of the discourse determination module; determining relevance scores for discourse constituents by the relevance score determination module; percolating relevance scores by the percolation module based on the structural representation of discourse comprising; for each discourse constituent leaf node, determining the number of subordinated edges plus one; determining a score based on the inverse of the number of subordinated edges +1; for each discourse constituent node, assigning the score of a child discourse constituent node to the parent discourse constituent node, if the score is less relevant; for any subordination discourse constituent node, assigning the score of the subordinated discourse constituent node to the subordinating discourse constituent node if the subordinated discourse constituent score is lower; assigning the relevance scores of any coordination discourse constituent node to each child discourse constituent of the coordination if it is lower; determining an adjusted relevance score based on the score and the subordination level; and determining a hybrid text summary, by the processor, based on discourse constituents with relevance scores compared to a threshold relevance score. - View Dependent Claims (9)
-
-
10. A system for determining hybrid text summaries comprising:
-
an input/output circuit for retrieving a text; a processor for determining discourse constituents for the text and attaching the discourse constituents into a structural representation of discourse; a relevance score determination circuit for determining relevance scores for the discourse constituents based on at least one non-structural measure of relevance; and a percolation circuit for percolating discourse constituent relevance scores based on the structural representation of discourse and where the processor determines a hybrid text summary based on the discourse constituents with relevance scores exceeding a threshold relevance score, wherein, for each child discourse constituent node in the structural representation, the percolation circuit assigns the relevance score of the child discourse constituent node to the parent discourse constituent node if the child discourse constituent node is more relevant; for any subordinating nodes, the percolation circuit assigns the relevance scores of the subordinated discourse constituent to the subordinating discourse constituent if the subordinated discourse constituent is more relevant; and for any coordination nodes, the percolation circuit assigns the relevance score of the most relevant child to other child discourse constituent nodes. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A system for determining hybrid text summaries comprising:
-
an input/output circuit for retrieving a text; a processor for determining discourse constituents for the text and attaching the discourse constituents into a structural representation of discourse; a relevance score determination circuit for determining relevance scores for the discourse constituents based on at least one non-structural measure of relevance; and a percolation circuit for percolating discourse constituent relevance scores based on the structural representation of discourse and where the processor determines a hybrid text summary based on the discourse constituents with relevance scores exceeding a threshold relevance score, wherein for each child discourse constituent node in the structural representation, the percolation circuit assigns the relevance score of the child discourse constituent node to the parent discourse constituent node if the child discourse constituent node is more relevant than its parent; for each coordinated discourse constituent node, the percolation circuit assigns the relevance score of the coordinated discourse constituent node to each preceding less relevant sibling node; for each child discourse constituent node that is not a coordinated discourse constituent node and is not a subordinated discourse constituent node, the percolation circuit assigns the relevance score of the parent discourse constituent node to the child discourse constituent node if the parent discourse constituent is more relevant than the child; for each coordinated discourse constituent node, the percolation circuit assigns the relevance score of the parent discourse constituent node to the coordinated discourse constituent node, if the coordinated discourse node and all its siblings are less relevant than the parent node; for each subordinated discourse constituent node, the percolation circuit assigns the relevance score of the subordinated discourse constituent node to the subordinating discourse constituent if the subordinated discourse constituent is more relevant than the subordinating node; and for each node, repeating these steps, until the percolation circuit can find no node whose relevance score is changed to the relevance score of another node. - View Dependent Claims (18)
-
-
19. A system for determining hybrid text summaries comprising:
-
an input/output circuit for retrieving a text; a processor for determining discourse constituents for the text and attaching the discourse constituents into a structural representation of discourse; a relevance score determination circuit for determining relevance scores for the discourse constituents based on at least one non-structural measure of relevance; a percolation circuit for percolating discourse constituent relevance scores based on the structural representation of discourse;
wherein for each discourse constituent leaf node, the percolation circuit determines the number of subordinated edges plus one and a score based on the inverse of the number of subordinated edges +1;for each discourse constituent node, the percolation circuit assigns the score of a child discourse constituent node to the parent discourse constituent, if the score is less relevant; for any subordination discourse constituent node, the percolation circuit assigns the score of the subordinated discourse constituent node to the subordinating discourse constituent node if the subordinated discourse constituent score is lower; the percolation circuit assigns the scores of any coordination discourse constituent node to each child discourse constituent of the coordination if it is lower; and the processor determines an adjusted relevance score based on the score and the subordination level; and a hybrid text summary based on the discourse constituents with relevance scores exceeding a threshold relevance score. - View Dependent Claims (20)
-
-
21. A hybrid text summarization system comprising:
-
means for determining discourse constituents for a text; means for determining a structural representation of discourse for the text; means for determining relevance scores for discourse constituents; means for percolating relevance scores based on the structural representation of discourse comprising the steps of; means for each discourse constituent leaf node, determining the number of subordinated edges plus one; means for determining a score based on the inverse of the number of subordinated edges +1; means for each discourse constituent node, assigning the score of a child discourse constituent node to the parent discourse constituent node, if the score is less relevant; means for any subordination discourse constituent node, assigning the score of the subordinated discourse constituent node to the subordinating discourse constituent node if the subordinated discourse constituent score is lower; means for assigning the relevance scores of any coordination discourse constituent node to each child discourse constituent of the coordination if it is lower; means for determining an adjusted relevance score based on the score and the subordination level; and means for determining a hybrid text summary based on discourse constituents with relevance scores compared to a threshold relevance score.
-
Specification