Systems and methods for hybrid text summarization
First Claim
1. A method of determining a hybrid text summary comprising the steps of:
- determining discourse constituents for a text;
determining a structural representation of discourse for the text;
determining relevance scores for discourse constituents based on at least one non-structural measure of relevance;
percolating relevance scores based on the structural representation of discourse;
determining a hybrid text summary based on discourse constituents with relevance scores compared to a threshold relevance score.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques are provided for segmenting text into categorized discourse constituents and attaching discourse constituents into a structural representation of discourse. Techniques for determining hybrid structural and non-structural summaries of a text are also provided. A text is segmented based on a theory of discourse analysis into at least a main discourse constituent containing spatio-temporal information about a single event in a possible world view. The discourse constituents are then inserted into a structural representation of discourse. Non-structural techniques are used to determine relevance scores and important discourse constituents are determined. Relevance scores are percolated through the structural representation of discourse to determine supporting preceding discourse constituents that preserve grammaticality. A hybrid text summary is then determined based on the structural representation of the discourse and relevance scores.
-
Citations
39 Claims
-
1. A method of determining a hybrid text summary comprising the steps of:
-
determining discourse constituents for a text;
determining a structural representation of discourse for the text;
determining relevance scores for discourse constituents based on at least one non-structural measure of relevance;
percolating relevance scores based on the structural representation of discourse;
determining a hybrid text summary based on discourse constituents with relevance scores compared to a threshold relevance score. - View Dependent Claims (2, 3, 4, 5, 6, 7, 9, 10, 28)
-
-
8. A method of determining a hybrid text summary comprising the steps of:
-
determining discourse constituents for a text;
determining a structural representation of discourse for the text;
determining relevance scores for discourse constituents;
percolating relevance scores based on the structural representation of discourse comprising the steps of;
for each discourse constituent leaf node, determining the number of subordinated edges plus one;
determining a score based on the inverse of the number of subordinated edges +1;
for each discourse constituent node, assigning the score of a child discourse constituent node to the parent discourse constituent node, if the score is less relevant;
for any subordination discourse constituent node, assigning the score of the subordinated discourse constituent node to the subordinating discourse constituent node if the subordinated discourse constituent score is lower;
assigning the relevance scores of any coordination discourse constituent node to each child discourse constituent of the coordination if it is lower;
determining an adjusted relevance score based on the score and the subordination level; and
determining a hybrid text summary based on discourse constituents with relevance scores compared to a threshold relevance score. - View Dependent Claims (34)
-
-
11. A system for determining hybrid text summaries comprising:
-
an input/output circuit for retrieving a text;
a processor for determining discourse constituents for the text and attaching the discourse constituents into a structural representation of discourse;
a relevance score determination circuit for determining relevance scores for the discourse constituents based on at least one non-structural measure of relevance;
a percolation circuit for percolating discourse constituent relevance scores based on the structural representation of discourse; and
where the processor determines a hybrid text summary based on the discourse constituents with relevance scores exceeding a threshold relevance score. - View Dependent Claims (12, 13, 14, 15, 16, 17, 19, 20, 21, 33)
-
-
18. A system for determining hybrid text summaries comprising:
-
an input/output circuit for retrieving a text;
a processor for determining discourse constituents for the text and attaching the discourse constituents into a structural representation of discourse;
a relevance score determination circuit for determining relevance scores for the discourse constituents based on at least one non-structural measure of relevance;
a percolation circuit for percolating discourse constituent relevance scores based on the structural representation of discourse;
wherein for each discourse constituent leaf node, the percolation circuit determines the number of subordinated edges plus one and a score based on the inverse of the number of subordinated edges +1;
for each discourse constituent node, the percolation circuit assigns the score of a child discourse constituent node to the parent discourse constituent, if the score is less relevant;
for any subordination discourse constituent node, the percolation circuit assigns the score of the subordinated discourse constituent node to the subordinating discourse constituent node if the subordinated discourse constituent score is lower;
the percolation circuit assigns the scores of any coordination discourse constituent node to each child discourse constituent of the coordination if it is lower; and
the processor determines an adjusted relevance score based on the score and the subordination level; and
a hybrid text summary based on the discourse constituents with relevance scores exceeding a threshold relevance score. - View Dependent Claims (35)
-
-
22. A carrier wave encoded to transmit a control program, useable to program a computer to determine hybrid text summary, to a device for executing the program, the control program comprising:
-
instructions for determining discourse constituents for a text;
instructions for determining a structural representation of discourse for the text;
instructions for determining relevance scores for discourse constituents based on at least one non-structural measure of relevance;
instructions for percolating relevance scores based on the structural representation of discourse;
instructions for determining a hybrid text summary based on discourse constituents with relevance scores compared to a threshold relevance score.
-
-
23. Computer readable storage medium comprising:
- computer readable program code embodied on the computer readable storage medium, the computer readable program code usable to program a computer to determine hybrid text summary comprising the steps of;
determining discourse constituents for a text;
determining a structural representation of discourse for the text;
determining relevance scores for discourse constituents based on at least one non-structural measure of relevance;
percolating relevance scores based on the structural representation of discourse;
determining a hybrid text summary based on discourse constituents with relevance scores compared to a threshold relevance score.
- computer readable program code embodied on the computer readable storage medium, the computer readable program code usable to program a computer to determine hybrid text summary comprising the steps of;
-
24. A method for discourse parsing comprising the steps of:
-
determining a structural representation of discourse based on a theory of discourse analysis;
determining at least one sentence of a text;
determining sentential-level parse features for the at least one sentence;
determining a mapping between the sentential-level parse features and discourse-level parse features;
determining a discourse-level parse tree of the at least one sentence based on the mapping;
determining a main discourse constituent for the at least one sentence;
determining an attachment of the discourse level parse tree to the structural representation of discourse by the determined main discourse constituent based on attachment rules for the theory of discourse.
-
-
25. A method of segmenting text into discourse constituents comprising the steps of:
-
determining a theory of discourse analysis;
determining candidate segments;
determining attributes of candidate segments associated with continuing the discourse;
determining if the candidate segment is a discourse constituent based on the theory of discourse analysis and the determined attributes. - View Dependent Claims (26)
-
-
27. A method of determining a structural representation of discourse comprising the steps of:
-
determine discourse constituents for a text; and
conjoining the discourse constituents into a structural representation of discourse based on theory of discourse analysis classifications of the discourse constituents and at least one of a syntactic, a semantic and a lexical-semantic constraint.
-
-
29. A system for discourse parsing comprising:
-
an input/output circuit;
a processor which determines a structural representation of discourse based on a theory of discourse analysis;
determines at least one sentence of a text;
determines sentential-level parse features for the at least one sentence;
determines a mapping between the sentential-level parse features and discourse-level parse features;
determines a discourse-level parse tree of the at least one sentence based on the mapping;
determines a main discourse constituent for the at least one sentence;
determining an attachment of the discourse level parse tree to the structural representation of discourse by the determined main discourse constituent based on attachment rules for the theory of discourse.
-
-
30. A system for segmenting text into discourse constituents comprising:
-
an input/output circuit;
a processor which determines a theory of discourse analysis;
determines candidate segments;
determines attributes of candidate segments associated with continuing the discourse;
determines if the candidate segment is a discourse constituent based on the theory of discourse analysis and the determined attributes. - View Dependent Claims (31)
-
-
32. A system of determining a structural representation of discourse comprising:
-
an input/output circuit;
a processor which determines discourse constituents for a text; and
conjoins the discourse constituents into a structural representation of discourse based on theory of discourse analysis classifications of the discourse constituents, and at least one of a syntactic, a semantic and a lexical-semantic constraint.
-
-
36. A hybrid text summarization system comprising:
-
means for determining discourse constituents for a text;
means for determining a structural representation of discourse for the text;
means for determining relevance scores for discourse constituents based on at least one non-structural measure of relevance;
means for percolating relevance scores based on the structural representation of discourse; and
means for determining a hybrid text summary based on discourse constituents with relevance scores compared to a threshold relevance score.
-
-
37. A hybrid text summarization system comprising:
-
means for determining discourse constituents for a text;
means for determining a structural representation of discourse for the text;
means for determining relevance scores for discourse constituents;
means for percolating relevance scores based on the structural representation of discourse comprising the steps of;
means for each discourse constituent leaf node, determining the number of subordinated edges plus one;
means for determining a score based on the inverse of the number of subordinated edges +1;
means for each discourse constituent node, assigning the score of a child discourse constituent node to the parent discourse constituent node, if the score is less relevant;
means for any subordination discourse constituent node, assigning the score of the subordinated discourse constituent node to the subordinating discourse constituent node if the subordinated discourse constituent score is lower;
means for assigning the relevance scores of any coordination discourse constituent node to each child discourse constituent of the coordination if it is lower;
means for determining an adjusted relevance score based on the score and the subordination level; and
means for determining a hybrid text summary based on discourse constituents with relevance scores compared to a threshold relevance score.
-
-
38. A method for discourse parsing system comprising:
-
means for determining a structural representation of discourse based on a theory of discourse analysis;
means for determining at least one sentence of a text;
means for determining sentential-level parse features for the at least one sentence;
means for determining a mapping between the sentential-level parse features and discourse-level parse features;
means for determining a discourse-level parse tree of the at least one sentence based on the mapping;
means for determining a main discourse constituent for the at least one sentence; and
means for determining an attachment of the discourse level parse tree to the structural representation of discourse by the determined main discourse constituent based on attachment rules for the theory of discourse.
-
-
39. A text segmenting system comprising:
-
means for determining a theory of discourse analysis;
means for determining candidate segments;
means for determining attributes of candidate segments associated with continuing the discourse; and
means for determining if the candidate segment is a discourse constituent based on the theory of discourse analysis and the determined attributes.
-
Specification