Summarizing text documents by resolving co-referentiality among actors or objects around which a story unfolds
First Claim
1. A method for characterizing the content of a document comprising the steps of:
- a) identifying a plurality of discourse referents in the document;
b) dividing the document into topically relevant document segments;
c) resolving co-referentiality among the discourse referents within, and across, the document segments;
d) calculating salience values for the discourse referents based upon the resolving step;
e) determining topic stamps for the document segments based upon discourse salience values of the associated discourse referents; and
f) providing a capsule overview of the document, constructed from the topic stamps.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for characterizing the content of a document is disclosed. The method and system comprise identifying a plurality of discourse referents in the document, dividing the document into topically relevant document segments, and resolving co-referentiality among the discourse referents within, and across, the document segments. The method and system also comprises calculating salience values for the discourse referents based upon the resolving step, and determining topic stamps for the document segments based upon the salience values of the associated discourse referents. Finally the method and system comprise providing summary-like abstractions, in the form of capsule overviews of each of the segments derived from its topic stamps. In so doing, a capsule overview is derived for the entire document, which will depict the core content of an average length article in a more accurate and representative manner than utilizing conventional techniques.
-
Citations
20 Claims
-
1. A method for characterizing the content of a document comprising the steps of:
-
a) identifying a plurality of discourse referents in the document;
b) dividing the document into topically relevant document segments;
c) resolving co-referentiality among the discourse referents within, and across, the document segments;
d) calculating salience values for the discourse referents based upon the resolving step;
e) determining topic stamps for the document segments based upon discourse salience values of the associated discourse referents; and
f) providing a capsule overview of the document, constructed from the topic stamps. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for characterizing the content of a document comprising:
-
means for identifying a plurality of discourse referents in the document;
means for dividing the document into topically relevant document segments;
means for resolving co-referentiality among the discourse referents within, and across, the document segments;
means for calculating salience values for the discourse referents based upon the resolving step;
means for determining topic stamps for the document segments based upon discourse salience values of the associated discourse referents; and
means for providing a capsule overview of the document, constructed from the topic stamps. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A method for characterizing the content of a document comprising the steps of:
-
a) identifying a plurality of discourse referents in the document;
b) dividing the document into topically relevant document segments;
c) resolving co-referentiality among the discourse referents within, and across, the document segments, wherein the resolving step comprises linking the discourse referents by co-referentiality with each other to assess a frequency with which they appear within a document and to establish prominence;
d) calculating salience values for the discourse referents based upon the resolving step;
e) determining topic stamps for the document segments based upon discourse salience values of the associated discourse referents; and
f) providing a capsule overview of the document, constructed from the topic stamps.
-
-
20. A system for characterizing the content of a document comprising:
-
means for identifying a plurality of discourse referents in the document;
means for dividing the document into topically relevant document segments;
means for resolving co-referentiality among the discourse referents within, and across, the document segments wherein the resolving means comprises means for linking the discourse referents by co-referentiality with each other to assess a frequency with which they appear within a document and to establish prominence;
means for calculating salience values for the discourse referents based upon the resolving step;
means for determining topic stamps for the document segments based upon discourse salience values of the associated discourse referents; and
means for providing a capsule overview of the document, constructed from the topic stamps.
-
Specification