AUTOMATIC GENERATION OF DOCUMENT SUMMARIES THROUGH USE OF STRUCTURED TEXT
0 Assignments
0 Petitions
Accused Products
Abstract
A summarization system generates summaries from documents. Text structure tags, in conformance with the Text Encoding Initiative (TEI), are inserted into the documents to generate encoded documents. The text structure tags, when associated with portions of the document, identify text types. A text type, such as an argumentative text type, provides meta-information about the associated portion of text. The documents are also encoded, via document type declaration (“DTD”) in the eXtensible mark-up language (“XML”), to generate a tree structure that depicts the text types and hierarchical relationships among the text types in the tree structure. The summarization system generates a summary of the documents by extracting portions of the document, associated with the text type tags, using the tree structure in accordance with user input. The summarization system may be used to generate summaries from multiple documents.
-
Citations
32 Claims
-
1-15. -15. (canceled)
-
16. A computer-implemented method for generating summaries from at least one document comprising text, said method comprising the steps of:
-
a) generating text structure tags for at least one document, said text structure tags identifying a plurality of argumentative text types, wherein a text type comprises a type of argumentative content for an associated portion of a document, said types of argumentative content comprising an argument premise giving support, evidence, or reasoning for or against a conclusion or the conclusion comprising a resulting determination made using one or more argument premises; b) encoding said document to generate a tree structure comprising a plurality of nodes, wherein said nodes correspond with said text types and hierarchical relationships among said nodes reflect argumentative relationships among said text types; and c) generating a summary of said document by; i) receiving from a user a selection of one or more particular text types for summarization; and ii) extracting, based on said text structure tags, portions of text from said document that correspond to nodes corresponding to said one or more selected text types. - View Dependent Claims (17, 18, 19, 26, 27, 28, 29)
-
-
20. A computer readable medium comprising a plurality of instructions stored thereon when executed, generate summaries from at least one document comprising text, the computer readable medium comprising sets of instructions for:
-
a) generating text structure tags for at least one document, said text structure tags identifying a plurality of argumentative text types, wherein a text type comprises a type of argumentative content for an associated portion of a document, said types of argumentative content comprising an argument premise giving support, evidence, or reasoning for or against a conclusion or the conclusion comprising a resulting determination made using one or more argument premises; b) encoding said document to generate a tree structure comprising a plurality of nodes, wherein said nodes correspond with said text types and hierarchical relationships among said nodes reflect argumentative relationships among said text types; and c) generating a summary of said document by; i) receiving from a user a selection of one or more particular text types for summarization; and ii) extracting, based on said text structure tags, portions of text from said document that correspond to nodes corresponding to said one or more selected text types. - View Dependent Claims (21, 22, 23, 30)
-
-
24. A computer system for generating summaries of documents comprising text, said computer system comprising:
-
A memory for storing at least one document; and a processing unit, coupled to said memory, configured for; generating text structure tags from at least one document, said text structure tags identifying a plurality of argumentative text types, wherein a text type comprises a type of argumentative content for an associated portion of a document, said types of argumentative content comprising an argument premise giving support, evidence, or reasoning for or against a conclusion or the conclusion comprising a resulting determination made using one or more argument premises, encoding said document to generate a tree structure comprising a plurality of nodes, wherein said nodes correspond with said text types and hierarchical relationships among said nodes reflect argumentative relationships among said text types, and generating a summary of said document by receiving from a user a selection of one or more particular text types for summarization and by extracting, based on said text structure tags, portions of text from said document that correspond to nodes corresponding to said one or more selected text types. - View Dependent Claims (25, 31)
-
-
32. A computer-implemented method for generating summaries for a plurality of documents comprising text, the method comprising:
-
a) for each document in the plurality of documents, generating text structure tags for the document including generating text structure tags in accordance with Text Encoding Initiative (TEI), the text structure tags identifying a plurality of argumentative text types, wherein a text type comprises a type of argumentative content for an associated portion of a document, the types of argumentative content comprising an argument premise giving support, evidence, or reasoning for or against a conclusion or the conclusion comprising a resulting determination made using one or more argument premises; b) for each document in the plurality of documents, encoding the document to generate a tree structure comprising a plurality of nodes, wherein the nodes correspond with the text types and hierarchical relationships among the nodes reflect argumentative relationships among the text types; c) selecting a plurality of tree structures for the plurality of documents; d) combining, as a single logical tree structure, the plurality of tree structures; and e) generating a summary for the plurality of documents by; i) receiving from a user a selection of one or more particular text types for summarization, the one or more particular text types comprising the argument premise text type; and ii) identifying, based upon the text type tags, a set of nodes from the plurality of tree structures corresponding to the one or more selected text types including one or more nodes corresponding to the argument premise text type; and iii) extracting portions of text from the plurality of documents that correspond to the identified set of nodes selected to form a summary of the plurality of documents.
-
Specification