Automatic generation of document summaries through use of structured text
First Claim
1. A method comprising the steps of:
- a) for each document in a plurality of documents, generating, by a computer system, text structure tags for the document, said text structure tags identifying a plurality of argumentative text types, wherein a text type comprises a type of argumentative content for an associated portion of a document, said types of argumentative content comprising an argument premise giving support, evidence, or reasoning for or against a conclusion or the conclusion comprising a resulting determination made using one or more argument premises;
b) for each document in the plurality of documents, encoding, by the computer system, said document to generate a tree structure comprising a plurality of nodes, wherein said nodes correspond with said text types and hierarchical relationships among said nodes reflect argumentative relationships among said text types;
c) selecting, by the computer system, a plurality of tree structures for the plurality of documents;
d) combining, by the computer system, the plurality of tree structures as a single logical tree structure; and
e) generating, by the computer system, a summary of the plurality of documents by;
i) receiving from a user a selection of one or more particular text types for summarization; and
ii) extracting, based on said text structure tags, portions of text from the plurality of documents that correspond to nodes from the plurality of tree structures to form a summary of the plurality of documents, the nodes corresponding to said one or more selected text types.
0 Assignments
0 Petitions
Accused Products
Abstract
A summarization system generates summaries from documents. Text structure tags, in conformance with the Text Encoding Initiative (TEI), are inserted into the documents to generate encoded documents. The text structure tags, when associated with portions of the document, identify text types. A text type, such as an argumentative text type, provides meta-information about the associated portion of text. The documents are also encoded, via document type declaration (“DTD”) in the eXtensible mark-up language (“XML”), to generate a tree structure that depicts the text types and hierarchical relationships among the text types in the tree structure. The summarization system generates a summary of the documents by extracting portions of the document, associated with the text type tags, using the tree structure in accordance with user input. The summarization system may be used to generate summaries from multiple documents.
46 Citations
14 Claims
-
1. A method comprising the steps of:
-
a) for each document in a plurality of documents, generating, by a computer system, text structure tags for the document, said text structure tags identifying a plurality of argumentative text types, wherein a text type comprises a type of argumentative content for an associated portion of a document, said types of argumentative content comprising an argument premise giving support, evidence, or reasoning for or against a conclusion or the conclusion comprising a resulting determination made using one or more argument premises; b) for each document in the plurality of documents, encoding, by the computer system, said document to generate a tree structure comprising a plurality of nodes, wherein said nodes correspond with said text types and hierarchical relationships among said nodes reflect argumentative relationships among said text types; c) selecting, by the computer system, a plurality of tree structures for the plurality of documents; d) combining, by the computer system, the plurality of tree structures as a single logical tree structure; and e) generating, by the computer system, a summary of the plurality of documents by; i) receiving from a user a selection of one or more particular text types for summarization; and ii) extracting, based on said text structure tags, portions of text from the plurality of documents that correspond to nodes from the plurality of tree structures to form a summary of the plurality of documents, the nodes corresponding to said one or more selected text types. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory computer readable medium having stored thereon a plurality of instructions executable by a computer system the plurality of instructions including instructions for:
-
a) for each document in a plurality of documents, generating text structure tags for the document, said text structure tags identifying a plurality of argumentative text types, wherein a text type comprises a type of argumentative content for an associated portion of a document, said types of argumentative content comprising an argument premise giving support, evidence, or reasoning for or against a conclusion or the conclusion comprising a resulting determination made using one or more argument premises; b) for each document in the plurality of documents, encoding said document to generate a tree structure comprising a plurality of nodes, wherein said nodes correspond with said text types and hierarchical relationships among said nodes reflect argumentative relationships among said text types; c) selecting a plurality of tree structures for the plurality of documents; d) combining the plurality of tree structures as a single logical tree structure; and e) generating a summary of the plurality of documents by; i) receiving from a user a selection of one or more particular text types for summarization; and ii) extracting, based on said text structure tags, portions of text from the plurality of documents that correspond to nodes from the plurality of tree structures to form a summary of the plurality of documents, the nodes corresponding to said one or more selected text types. - View Dependent Claims (9, 10, 11)
-
-
12. A computer system comprising:
-
a memory for storing a plurality of documents; and a processing unit, coupled to said memory, configured to; for each document in the plurality of documents, generate text structure tags for the document, said text structure tags identifying a plurality of argumentative text types, wherein a text type comprises a type of argumentative content for an associated portion of a document, said types of argumentative content comprising an argument premise giving support, evidence, or reasoning for or against a conclusion or the conclusion comprising a resulting determination made using one or more argument premises, for each document in the plurality of documents, encode said document to generate a tree structure comprising a plurality of nodes, wherein said nodes correspond with said text types and hierarchical relationships among said nodes reflect argumentative relationships among said text types, select a plurality of tree structures for the plurality of documents; combine the plurality of tree structures as a single logical tree structure; and generate a summary of the plurality of documents by; receiving from a user a selection of one or more particular text types for summarization; and extracting, based on said text structure tags, portions of text from the plurality of documents that correspond to nodes from the plurality of tree structures to form a summary of the plurality of documents, the nodes corresponding to said one or more selected text types. - View Dependent Claims (13)
-
-
14. A computer-implemented method for generating summaries for a plurality of documents comprising text, the method comprising:
-
a) for each document in the plurality of documents, generating text structure tags for the document including generating text structure tags in accordance with Text Encoding Initiative (TEI), the text structure tags identifying a plurality of argumentative text types, wherein a text type comprises a type of argumentative content for an associated portion of a document, the types of argumentative content comprising an argument premise giving support, evidence, or reasoning for or against a conclusion or the conclusion comprising a resulting determination made using one or more argument premises; b) for each document in the plurality of documents, encoding the document to generate a tree structure comprising a plurality of nodes, wherein the nodes correspond with the text types and hierarchical relationships among the nodes reflect argumentative relationships among the text types; c) selecting a plurality of tree structures for the plurality of documents;
d) combining, as a single logical tree structure, the plurality of tree structures; and
e) generating a summary for the plurality of documents by;i) receiving from a user a selection of one or more particular text types for summarization, the one or more particular text types comprising the argument premise text type; and ii) identifying, based upon the text type tags, a set of nodes from the plurality of tree structures corresponding to the one or more selected text types including one or more nodes corresponding to the argument premise text type; and iii) extracting portions of text from the plurality of documents that correspond to the identified set of nodes selected to form a summary of the plurality of documents.
-
Specification