Apparatus and method for generating a summary according to hierarchical structure of topic
First Claim
Patent Images
1. A text summarization apparatus, comprising:
- a structure detection device setting up window widths of several sizes, measuring a cohesion degree indicating the strength of lexical cohesion in each window width, obtaining both global cohesion due to words repeated at long intervals and local cohesion due to words repeated at short intervals based on the cohesion degree, and detecting a hierarchical structure of topics of a given document based on the global cohesion and local cohesion;
a leading sentence extraction device detecting an introductory part of each topic and extracting one or more sentences which directly indicate topic content from the introductory part as leading sentences in a concentrated manner based. on the hierarchical structure of topics detected by said structure detection device; and
a summary composition device grouping the leading sentences for each topic extracted by said leading sentence extraction device, and generating a summary.
2 Assignments
0 Petitions
Accused Products
Abstract
A text summarizer detects the hierarchical structure of topics in a document and extracts boundary sentences corresponding to the turning point of a topic from a candidate section of a topic boundary using the relation degree between a sentence and a topic passage. Then, the text summarizer extracts topic introductory sentences serving the purpose of introducing a topic from the introductory part of the topic beginning with this boundary sentence, and generates a summary using both the boundary sentences and topic introductory sentences.
111 Citations
16 Claims
-
1. A text summarization apparatus, comprising:
-
a structure detection device setting up window widths of several sizes, measuring a cohesion degree indicating the strength of lexical cohesion in each window width, obtaining both global cohesion due to words repeated at long intervals and local cohesion due to words repeated at short intervals based on the cohesion degree, and detecting a hierarchical structure of topics of a given document based on the global cohesion and local cohesion;
a leading sentence extraction device detecting an introductory part of each topic and extracting one or more sentences which directly indicate topic content from the introductory part as leading sentences in a concentrated manner based. on the hierarchical structure of topics detected by said structure detection device; and
a summary composition device grouping the leading sentences for each topic extracted by said leading sentence extraction device, and generating a summary. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A text summarization apparatus, comprising:
-
a leading sentence extraction device detecting an introductory part of a topic for each given topic boundary and extracting one or more sentences which directly indicate topic content, from the introductory part by identifying and extracting a boundary sentence corresponding to a turning point of a topic, from sentences located in a vicinity of a topic boundary, based on a difference between a forward relation degree indicating a relation degree between the boundary sentence and a following topic passage immediately after the topic boundary, and a backward relation degree indicating a relation degree between the boundary sentence and a preceding topic passage immediately before the topic boundary; and
a summary composition device generating a summary using the extracted sentences.
-
-
9. A computer-readable storage medium on which is recorded a program enabling a computer to execute a process, said process comprising:
-
setting up window widths of several sizes;
measuring a cohesion degree indicating a strength of lexical cohesion in each window width;
obtaining both global cohesion due to words repeated at long intervals and local cohesion due to words repeated at short intervals based on the cohesion degree;
detecting a hierarchical structure of topics in a given document based on the global and local cohesion;
detecting an introductory part of each topic based on the hierarchical structure of topics that was detected;
extracting one or more sentences directly indicating topic content from the introductory part in a concentrated manner based on the hierarchical structure of topics that was detected; and
grouping the one or more sentences for each topic to generate a summary.
-
-
10. A text summarization method, comprising:
-
setting up window widths of several sizes;
measuring a cohesion degree indicating a strength of lexical cohesion in each window width;
obtaining both global cohesion due to words repeated at long intervals and local cohesion due to words repeated at short intervals based on the cohesion degree;
detecting a hierarchical structure of topics in a given document based on the global and local cohesion;
detecting an introductory part of each topic based on the hierarchical structure of topics that was detected;
extracting one or more sentences directly indicating a topic content from the introductory part in a concentrated manner based on the hierarchical structure of topics that was detected; and
grouping the one or more sentences for each topic to generate a summary.
-
-
11. A text summarization apparatus, comprising:
-
structure detection means for setting up window widths of several sizes;
measuring a cohesion degree indicating a strength of lexical cohesion in each window width;
obtaining both global cohesion mainly due to words repeated at long intervals and local cohesion due to words repeated at short intervals based on the cohesion degree, and detecting a hierarchical structure of topics in a given document based on the global and local cohesion;
leading sentence extraction means for detecting an introductory part of each topic and extracting one or more sentences which directly indicate topic content from the introductory part as leading sentences in a concentrated manner based on the hierarchical structure of topics detected by said structure detection means; and
summary composition means for grouping the one or more sentences for each topic and for generating a summary.
-
-
12. A text summarization apparatus, comprising:
-
leading sentence extraction means for detecting an introductory part of a topic for each given topic boundary and extracting one or more sentences which directly indicate topic content, from the introductory part by identifying and extracting a boundary sentence corresponding to a turning point of a topic from sentences located in a vicinity of the topic boundary, based on a difference between a forward relation degree indicating a relation degree between a sentence and a following topic passage immediately after the topic boundary and a backward relation degree indicating a relation degree between the sentence and a preceding topic passage immediately before the topic boundary; and
summary composition means for generating a summary using the extracted sentences.
-
-
13. A text summarization apparatus, comprising:
-
a structure detection device to detect a hierarchical structure of topics of a given document;
a leading sentence extraction device to detect an introductory part of each topic and extract at least one leading sentence which directly indicates topic content of the introductory part, by extracting a boundary sentence, corresponding to a turning point of a topic, from sentences located in a vicinity of a topic boundary detected by said structure detection device, based on a difference between a forward relation degree indicating a relation degree between the boundary sentence and a following topic passage immediately after the topic boundary, and a backward relation degree indicating a relation degree between the boundary sentence and a preceding topic passage immediately before the topic boundary; and
a summary composition device to group the leading sentences for each topic extracted by said leading sentence extraction device, and to generate a summary.
-
-
14. A text summarization apparatus, comprising:
-
a structure detection device to detect a hierarchical structure of topics of a given document;
a leading sentence extraction device to detect an introductory part of each topic and extract at least one leading sentence which directly indicates topic content of the introductory part, by extracting a boundary sentence, corresponding to a turning point of a topic, from sentences located in a vicinity of a topic boundary detected by said structure detection device, based on a difference between a forward relation degree indicating a relation degree between the boundary sentence and a following topic passage immediately after the topic boundary, and a backward relation degree indicating a relation degree between the boundary sentence and a preceding topic passage immediately before the topic boundary, and by further extracting a topic introductory sentence serving a purpose of introducing a topic from sentences of the introductory part beginning with the boundary sentence as a leading sentence based on the forward relation degree; and
a summary composition device to group the leading sentences for each topic extracted by said leading sentence extraction device, and to generate a summary.
-
-
15. A text summarization apparatus, comprising:
-
a structure detection device to detect a hierarchical structure of topics of a given document;
a leading sentence extraction device to detect an introductory part of each topic and extract leading sentences which directly indicate. topic content of the introductory part in a concentrated manner; and
a summary composition device to group the leading sentences for each topic extracted by said leading sentence extraction device, to remove from the leading sentences order label information of a heading included by said leading sentence extraction device, and to generate a summary including the leading sentences without the order label information. - View Dependent Claims (16)
-
Specification