Apparatus and method for generating digest according to hierarchical structure of topic
First Claim
Patent Images
1. A digest generator apparatus comprising:
- a structure detecting unit, by detecting a set of topic boundaries of a document based on a lexical cohesion degree, to detect a hierarchical structure of topic passages in the document, in which each of the topic passages corresponds to a part of the document describing a common topic, the hierarchical structure includes a plurality of levels, the topic passages in each of the levels compose the document, and each topic passage in a higher level includes one or more shorter topic passages in a lower level;
a keyword extracting unit extracting a plurality of keywords using the hierarchical structure;
a sentence selecting unit selecting a plurality of key sentences from one of the topic passages in the hierarchical structure depending on a use condition of the keywords; and
an outputting unit outputting the key sentences as a digest.
2 Assignments
0 Petitions
Accused Products
Abstract
A digest generator apparatus calculates a lexical cohesion degree at each position in a document using a plurality of windows having different sizes, and calculates the candidate section of a topic boundary for each topic level corresponding to the size of a window. Then, by unifying the candidate section of different levels, the digest generator apparatus detects the topic boundary for each level. Then, based on the relation between a summarization-target topic passage and a long topic passage containing the summarization-target topic passage the digest generator apparatus extracts key sentences and generates a digest.
-
Citations
16 Claims
-
1. A digest generator apparatus comprising:
-
a structure detecting unit, by detecting a set of topic boundaries of a document based on a lexical cohesion degree, to detect a hierarchical structure of topic passages in the document, in which each of the topic passages corresponds to a part of the document describing a common topic, the hierarchical structure includes a plurality of levels, the topic passages in each of the levels compose the document, and each topic passage in a higher level includes one or more shorter topic passages in a lower level;
a keyword extracting unit extracting a plurality of keywords using the hierarchical structure;
a sentence selecting unit selecting a plurality of key sentences from one of the topic passages in the hierarchical structure depending on a use condition of the keywords; and
an outputting unit outputting the key sentences as a digest. - View Dependent Claims (2, 3, 4, 5)
said keyword extracting unit evaluates whether or not a term used in a range of one of the topic passages in the hierarchical structure is characteristic of said one of the topic passages, and extracts one of the plurality of keywords from said one of the topic passages based on an evaluation result. -
3. The digest generator apparatus according to claim 2, wherein said keyword extracting unit obtains said evaluation result using both a use frequency of an evaluation target term in said one of the topic passages and a use frequency of the evaluation target term in a long topic passage including said one of the topic passages.
-
4. The digest generator apparatus according to claim 1, wherein said keyword extracting unit extracts a local keyword from a summarization-target topic passage, and extracts a global keyword from a longer topic passage including the summarization-target topic passage;
- and
said sentence selecting unit selects said plurality of key sentences from the summarization-target topic passage based on the use condition of the local keyword and the global keyword.
- and
-
5. The digest generator apparatus according to claim 1, further comprising:
a determining unit determining the number of topics to be extracted for the digest from both a size of the document and a desired size of the digest.
-
-
6. A digest generator apparatus comprising:
-
a structure detecting unit detecting a hierarchical structure of topic passages in a document, in which each of the topic passages corresponds to a part of the document describing a common topic, the hierarchical structure includes a plurality of levels, the topic passages in each of the levels compose the document, and each topic passage in a higher level includes one or more shorter topic passages in a lower level, wherein said structure detecting unit calculates a lexical cohesion degree in a vicinity area of each position in said document, detects a set of the topic boundaries that separate said document into the topic passages of almost the same size based on the cohesion degree, and by repeating detection of the set of the topic boundaries while reducing the size of the vicinity area, detects the hierarchical structure of the topic passages, wherein the topic passages range from a size of about a fraction of the document to about one paragraph;
a keyword extracting unit extracting a plurality of keywords using the hierarchical structure;
a sentence selecting unit selecting a plurality of key sentences from one of the topic passages in the hierarchical structure depending on a use condition of the keywords; and
an outputting unit outputting the key sentences as a digest. - View Dependent Claims (7, 8, 9)
a major part specifying unit removing a document portion having a lower cohesion degree and extracting a document portion having a higher cohesion degree as a major part, and wherein said sentence selecting unit selects said key sentences from a topic passage corresponding to the major part.
-
-
10. A digest generator apparatus, comprising:
-
a keyword extracting unit evaluating whether or not a word is characteristic of a process target topic passage in a document by calculating a likelihood ratio based on a comparison of a use frequency of the word in the process target topic passage with a use frequency of the word in a longer topic passage including the process target topic passage and comparing the likelihood ratio to a predetermined threshold value, and extracting a keyword from the process target topic passage when the likelihood ratio is greater than the predetermined threshold value;
a generating unit generating a digest according to a use condition of said keyword; and
an outputting unit outputting said digest.
-
-
11. A digest generator apparatus, comprising:
-
a unit repeatedly calculating a lexical cohesion degree in a vicinity of each position in a document to define topic passages in the document while varying a size of the vicinity;
a major part specifying unit specifying a major part of the document to be summarized by removing one or more topic passages having a lower cohesion degree from the document;
a generating unit generating a digest using said major part; and
an outputting unit outputting said digest.
-
-
12. A computer-readable storage medium for storing a program which enables a computer to perform:
-
detecting a set of topic boundaries of a document based on a lexical cohesion degree to detect a hierarchical structure of topic passages in the document, in which each of the topic passages corresponds to a part of the document describing a common topic, the hierarchical structure includes a plurality of levels, the topic passages in each of the levels compose the document, and each of the topic passages in a higher level includes one or more shorter topic passages in a lower level;
extracting a plurality of keywords using the hierarchical structure;
selecting a plurality of key sentences from one of the topic passages in the hierarchical structure depending on a use condition of the keywords; and
generating a digest which includes of the key sentences. - View Dependent Claims (13)
extracting local keywords from a summarization target topic;
extracting global keywords from a longer topic passage including said summarization-target topic passage;
selecting both sentences regarding a local topic and sentences regarding a global topic based on a use condition of said local keywords and said global keywords; and
generating a digest in which the sentences regarding the local topic and the sentences regarding the global topic are balanced.
-
-
14. A computer-readable storage medium for storing a program which enables a computer to perform:
-
calculating a likelihood ratio based on a comparison of a use frequency of a word in a process target topic passage in a document with a use frequency of said word in a longer topic passage including said process target topic passage;
evaluating whether or not said word is characteristic of said process target topic passage by comparing the likelihood ratio to a predetermined threshold value;
extracting a keyword from the process target topic passage when the likelihood ratio is greater than the predetermined threshold value ; and
generating a digest according to a use condition of said keyword.
-
-
15. A computer-readable storage medium for storing a program which enables a computer to perform:
-
specifying a major part of the document to be summarized by removing repeatedly calculating a lexical cohesion degree in a vicinity of each position in a document to define topic passages in the document while varying a size of the vicinity;
specifying a major part of the document to be summarized by removing one or more topic passages having a lower cohesion from the document; and
generating a digest using said major part.
-
-
16. A method of generating a digest, comprising:
-
detecting a set of topic boundaries of a document based on a lexical cohesion degree to detect a hierarchical structure of topic passages in the document, in which each of the topic passages corresponds to a part of the document describing a common topic, the hierarchical structure includes a plurality of levels, the topic passages in each of the levels compose the document, and each of the topic passages in a higher level includes one or more shorter topic passages in a lower level;
extracting a plurality of keywords using the hierarchical structure;
selecting a plurality of key sentences from one of the topic passages in the hierarchical structure depending on a use condition of the keywords; and
generating a digest which includes the key sentences.
-
Specification