Method and apparatus for summarizing multiple documents using a subsumption model
First Claim
Patent Images
1. A computer-implemented method comprising:
- parsing a plurality of paragraphs in a plurality of computer documents stored on a computer-readable medium, each document with one or more of the paragraphs;
selecting paragraphs from the documents through a subsuming relation calculation including,creating a link from terms in each paragraph to identical terms in substantially all of the other paragraphs, wherein terms include noun phrases, verb phrases or entity names,counting for each paragraph the number of links from the terms in the paragraph to the terms in other paragraphs,denoting for each paragraph the number of links counted for that paragraph as the significant score of that paragraph,ranking the paragraphs by the significant score,selecting paragraphs based on the ranking, wherein paragraphs in the ranking that subsume the highest number of other paragraphs are selected prior to other paragraphs in the ranking, and wherein a first paragraph subsumes a second paragraph if all noun phrases verb phrases, and entity names contained in the second paragraph are also contained in the first paragraph;
aggregating the selected paragraphs into a summary and outputting the summary.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for parsing a plurality of documents, selecting paragraphs from the documents through subsuming relation calculation, and rewriting the selected paragraphs into a summary is disclosed.
43 Citations
15 Claims
-
1. A computer-implemented method comprising:
-
parsing a plurality of paragraphs in a plurality of computer documents stored on a computer-readable medium, each document with one or more of the paragraphs; selecting paragraphs from the documents through a subsuming relation calculation including, creating a link from terms in each paragraph to identical terms in substantially all of the other paragraphs, wherein terms include noun phrases, verb phrases or entity names, counting for each paragraph the number of links from the terms in the paragraph to the terms in other paragraphs, denoting for each paragraph the number of links counted for that paragraph as the significant score of that paragraph, ranking the paragraphs by the significant score, selecting paragraphs based on the ranking, wherein paragraphs in the ranking that subsume the highest number of other paragraphs are selected prior to other paragraphs in the ranking, and wherein a first paragraph subsumes a second paragraph if all noun phrases verb phrases, and entity names contained in the second paragraph are also contained in the first paragraph; aggregating the selected paragraphs into a summary and outputting the summary. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-readable medium having stored thereon sequences of instructions which are executable by a processor, and which, when executed by the processor, cause the processor to perform operations comprising:
-
parsing a plurality of paragraphs in a plurality of computer documents, each document with one or more of the paragraphs; selecting paragraphs from the documents through a subsuming relation calculation including, creating a link from terms in each paragraph to identical terms in substantially all of the other paragraphs, wherein terms include noun phrases, verb phrases or entity names, counting for each paragraph the number of links from the terms in the paragraph to the terms in other paragraphs, denoting for each paragraph the number of links counted for that paragraph as the significant score of that paragraph, ranking the paragraphs by the significant score, selecting paragraphs based on the ranking, wherein paragraphs in the ranking that subsume the highest number of other paragraphs are selected prior to other paragraphs in the ranking, and wherein a first paragraph subsumes a second paragraph if all noun phrases, verb phrases, and entity names contained in the second paragraph are also contained in the first paragraph; aggregating the selected paragraphs into a summary; and
outputting the summary. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A system comprising:
-
a processor; a bus coupled to the processor; and a unit coupled to the bus to; parse a plurality of paragraphs in a plurality of computer documents, each document including one or more of the paragraphs, select paragraphs from the documents through a subsuming relation calculation including; creating a link from terms in each paragraph to identical terms in substantially all of the other paragraphs, wherein terms include noun phrases, verb phrases or entity names, counting for each paragraph the number of links from the terms in the paragraph to the terms in other paragraphs, denoting for each paragraph the number of links counted for that paragraph as the significant score of that paragraph, ranking the paragraphs by the significant score, selecting paragraphs based on the ranking, wherein paragraphs in the ranking that subsume the highest number of other paragraphs are selected prior to other paragraphs in the ranking, and wherein a first paragraph subsumes a second paragraph if all noun phrases, verb phrases, and entity names contained in the second paragraph are also contained in the first paragraph; aggregate the selected paragraphs into a summary; and
outputting the summary. - View Dependent Claims (12, 13, 14, 15)
-
Specification