Summarization apparatus and method
First Claim
1. An apparatus for summarizing a document in support of selection, access, edition, and management of the document readable by a computer, comprising:
- a focused information relevant portion extraction unit extracting a portion related to two types of focused information in a document to be summarized based on the two types of focused information comprising user-focused information as information focused by a user who uses a summary, and author-focused information as information emphasized by an author of the document to be summarized; and
a summary generation unit generating the summary of the document to be summarized based on an extraction result from said focused information relevant portion extraction unit.
1 Assignment
0 Petitions
Accused Products
Abstract
A document summarization apparatus or method summarizes an electronic document written in a natural language, and generates an appropriate summary depending on user'"'"'s focus and user'"'"'s knowledge. The document summarization apparatus according to the present invention includes, for example, a focused information relevant portion extraction unit, a summary readability improvement unit, and a summary generation unit. The focused information relevant portion extraction unit extracts a portion related to two types of focused information in a document to be summarized based on the two types of focused information, that is, user-focused information as information focused by a user who uses a summary, and author-focused information as information emphasized by an author of the document to be summarized. In the document to be summarized, the summary readability improvement unit distinguishes user known information already known to a user, and information known through an access log regarded as already known to a user based on a document previously presented to the user when a summary is generated, from other information than these two types of information, and selects an important portion in the document to be summarized. The summary generation unit generates the summary of the document to be summarized based on the selection result of the summary readability improvement unit. Thus, a summary can be generated with both user-focused information and author-focused information can be included depending on the knowledge level of a user.
-
Citations
29 Claims
-
1. An apparatus for summarizing a document in support of selection, access, edition, and management of the document readable by a computer, comprising:
-
a focused information relevant portion extraction unit extracting a portion related to two types of focused information in a document to be summarized based on the two types of focused information comprising user-focused information as information focused by a user who uses a summary, and author-focused information as information emphasized by an author of the document to be summarized; and
a summary generation unit generating the summary of the document to be summarized based on an extraction result from said focused information relevant portion extraction unit. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
said user-focused information equals contents of a query sentence input by the user to search said document to be summarized. -
3. The document summarization apparatus according to claim 1, wherein
said user-focused information and/or author-focused information are formatted in a word list or a weighted word list; said focused information relevant portion extraction unit extracts a portion related to the two types of focused information depending on an occurrence frequency of a word in the word list in the document to be summarized.
-
4. The document summarization apparatus according to claim 1, further comprising:
-
a user'"'"'s preference accumulation unit preliminarily accumulating a proposition in which the user is interested as user'"'"'s preference, wherein said focused information relevant portion extraction unit uses accumulated contents of said user'"'"'s preference accumulation unit as the user-focused information.
-
-
5. The document summarization apparatus according to claim 4, further comprising:
-
an other user'"'"'s preference usage unit providing said focused information relevant portion extraction unit with information including other users preferences as the user-focused information of the user who uses the summary in a predetermined access control system, making said focused information relevant portion extraction unit extract the two types of focused information, wherein said user'"'"'s preference accumulation unit accumulates the user'"'"'s preferences for each of a plurality of users.
-
-
6. The document summarization apparatus according to claim 1, wherein
said author-focused information, refers to a title of the document, a header of a chapter, a section, and a figure, a table of contents, and indices of words and topics, which is contained in a normally distributed document and, by which the author presents important points of the document. -
7. The document summarization apparatus according to claim 1, further comprising:
-
an author-focused information merge unit merging each piece of author-focused information for a plurality of documents to be summarized, wherein said focused information relevant portion extraction unit extracts a portion related to the two types of focused information in the plurality of documents to be summarized according to the merged author-focused information; and
said summary generation unit generates a summary of the plurality of documents to be summarized.
-
-
8. The document summarization apparatus according to claim 1, further comprising:
-
a document storage unit storing author-focused information specified by an author of a document or a document manager, after the document is generated, together with a document corresponding to the author-focused information, wherein said focused information relevant portion extraction unit uses the author-focused information stored in said document storage unit.
-
-
-
9. An apparatus for summarizing a document in support of selection, access, edition, and management of the document readable by a computer, comprising:
-
a summary readability improvement unit improving readability of a summary by distinguishing user known information already known to a user, and/or information known through an access log regarded as already known to a user based on a document previously presented to the user when a summary is generated, from other information than these two types of information, and by selecting an important portion in a document to be summarized; and
a summary generation unit generating the summary of the document to be summarized based on a selection result from said summary readability improvement unit. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
the readability of the summary is improved by preparing the user known information and/or information known through an access log using a known concept and a known proposition and by selecting the important portion in such a way that unknown concepts in the summary are reduced and that a proposition known less among the unknown propositions can be prioritized. -
11. The document summarization apparatus according to claim 10, further comprising:
-
a word recognition unit recognizing a word in a document; and
a word knowledge determination unit determining knowledge of a word recognized by said word recognition unit, wherein knowledge of said known concept is knowledge of a word appearing in a document.
-
-
12. The document summarization apparatus according to claim 10, further comprising:
-
a word combination recognition unit recognizing a combination of words appearing in a document; and
a word combination knowledge determination unit determining knowledge of combination of the words recognized by said word combination recognition unit, wherein knowledge of said known proposition is knowledge of a combination of words appearing in a document.
-
-
13. The document summarization apparatus according to claim 10, further comprising:
-
a word-predicate combination recognition unit recognizing a combination of a word and a predicate appearing in a document; and
a word-predicate combination knowledge determination unit determining knowledge of combination of the word and the predicate recognized by said word-predicate combination recognition unit, wherein knowledge of said known proposition is knowledge of a combination of a word and a predicate appearing in a document.
-
-
14. The document summarization apparatus according to claim 9, further comprising:
-
a user'"'"'s knowledge accumulation unit preliminarily accumulating a proposition known to the user as user'"'"'s knowledge, wherein said summary readability improvement unit uses the user'"'"'s knowledge accumulated in said user'"'"'s knowledge accumulation unit as the user known information.
-
-
15. The document summarization apparatus according to claim 14, further comprising:
-
an other user'"'"'s knowledge usage unit allowing said summary readability improvement unit to use information including other users'"'"' knowledge as user known information of a user who uses the summary in a predetermined access control system, wherein said user'"'"'s knowledge accumulation unit accumulates user'"'"'s knowledge for each of a plurality of users.
-
-
16. The document summarization apparatus according to claim 9, further comprising:
-
a document access log storage unit storing as a user'"'"'s document access log a document and a summary presented to a user during an operation of said document summarization apparatus and a system including said document summarization apparatus, and for providing the document access log for said summary readability improvement unit as a base of the information known through an access log; and
a document cross-reference unit making cross-reference between the document and the summary stored by said document access log storage unit and the document to be summarized.
-
-
17. The document summarization apparatus according to claim 16, wherein said document access log storage unit stores for each user the document access log of a plurality of users covering a long time including the operation.
-
18. The document summarization apparatus according to claim 17, further comprising:
an other user'"'"'s document access log usage unit allowing said summary readability improvement unit to use information including information known through an access log based on other users'"'"' document access log as information known through an access log of a user who uses the summary in a predetermined access control system.
-
19. The document summarization apparatus according to claim 9, wherein
said summary readability improvement unit comprises: a discourse structure analyzer for dividing each sentence in a document to be summarized into a predicate of the sentence and a predicate phrase basically including nouns depending on the predicate, defining a predicate phrase, among predicate phrases, independent of other predicate phrases as a main predicate phrase, isolating a topic phrase when the main predicate phrase contains the topic phrase, and setting a dependence between a topic phrase and a main predicate phrase and between a main predicate phrase and another predicate phrase according to a syntactic dependency structure in a sentence or between sentences.
-
-
20. An apparatus for summarizing a document in support of selection, access, edition, and management of the document readable by a computer, comprising:
-
a focused information relevant portion extraction unit extracting a portion related to two types of focused information in a document to be summarized based on the two types of focused information, that is, user-focused information as information focused by a user who uses a summary, and author-focused information as information emphasized by an author of the document to be summarized;
a summary readability improvement unit improving, corresponding to an extraction result from said focused information relevant portion extraction unit, readability of a summary by distinguishing user known information already known to a user, and/or information known through an access log regarded as already known to a user based on a document previously presented to the user when a summary is generated, from other information than these two types of information, and by selecting an important portion in a document to be summarized; and
a summary generation unit generating the summary of the document to be summarized based on the selection result from said summary readability improvement unit.
-
-
21. A method for summarizing a document in support of selection, access, edition, and management of the document readable by a computer, comprising:
-
extracting a portion related to two types of focused information in a document to be summarized based on the two types of focused information, that is, user-focused information as information focused by a user who uses a summary, and author-focused information as information emphasized by an author of the document to be summarized; and
generating the summary of the document to be summarized based on an extraction result of a portion related to the two types of focused information.
-
-
22. A method for summarizing a document in support of selection, access, edition, and management of the document readable by a computer, comprising:
-
distinguishing user known information already known to a user, and/or information known through an access log regarded as already known to a user based on a document previously presented to the user when a summary is generated, from other information than these two types of information, and selecting an important portion in a document to be summarized; and
generating the summary of the document to be summarized based on a selection result of the important portion.
-
-
23. A method for summarizing a document in support of selection, access, edition, and management of the document readable by a computer, comprising:
-
extracting a portion related to two types of focused information in a document to be summarized based on the two types of focused information, that is, user-focused information as information focused by a user who uses a summary, and author-focused information as information emphasized by an author of the document to be summarized;
distinguishing, corresponding to an extraction result, user known information already known to a user, and/or information known through an access log regarded as already known to a user based on a document previously presented to the user when a summary is generated, from other information than these two types of information, and selecting an important portion in a document to be summarized; and
generating the summary of the document to be summarized based on a selection result of the important portion.
-
-
24. A computer-readable storage medium storing a program used to direct a computer to perform, in summarizing a document in support of selection, access, edition, and management of the document readable by a computer, the following:
-
extracting a portion related to two types of focused information in a document to be summarized based on the two types of focused information, that is, user-focused information as information focused by a user who uses a summary, and author-focused information as information emphasized by an author of the document to be summarized; and
generating the summary of the document to be summarized based on an extraction result of a portion related to the two types of focused information.
-
-
25. A computer-readable storage medium storing a program used to direct a computer to perform, in summarizing a document in support of selection, access, edition, and management of the document readable by a computer, the following:
-
distinguishing user known information already known to a user, and/or information known through an access log regarded as already known to a user based on a document previously presented to the user when a summary is generated, from other information than these two types of information, and selecting an important portion in a document to be summarized; and
generating the summary of the document to be summarized based on a selection result of the important portion.
-
-
26. A computer-readable storage medium storing aprogram used to direct a computer to perform, in summarizing a document in support of selection, access, edition, and management of the document readable by a computer, the following:
-
extracting a portion related to two types of focused information in a document to be summarized based on the two types of focused information, that is, user-focused information as information focused by a user who uses a summary, and author-focused information as information emphasized by an author of the document to be summarized;
distinguishing, corresponding to an extraction result, user known information already known to a user, and/or information known through an access log regarded as already known to a user based on a document previously presented to the user when a summary is generated, from other information than these two types of information, and selecting an important portion in a document to be summarized; and
generating the summary of the document to be summarized based on a selection result of the important portion.
-
-
27. An apparatus for summarizing a document in support of selection, access, edition, and management of the document readable by a computer, comprising:
-
a focused information relevant portion extraction unit extracting a portion related to two types of focused information in a document to be summarized based on the two types of focused information comprising user-focused information as information focused by a user who uses a summary, and author-focused information as information emphasized by an author of the document to be summarized;
a summary generation unit generating the summary of the document to be summarized based on an extraction result from said focused information relevant portion extraction unit; and
said author-focused information, refers to a title of the document, a header of a chapter, a section, and a figure, a table of contents, and indices of words and topics, which is contained in a normally distributed document and, by which the author presents important points of the document.
-
-
28. An apparatus for summarizing a document in support of selection, access, edition, and management of the document readable by a computer, comprising:
-
focused information relevant portion extraction means for extracting a portion related to two types of focused information in a document to be summarized based on the two types of focused information comprising user-focused information as information focused by a user who uses a summary, and author-focused information as information emphasized by an author of the document to be summarized; and
summary generation means for generating the summary of the document to be summarized based on an extraction result from said focused information relevant portion extraction means.
-
-
29. An apparatus for summarizing a document in support of selection, access, edition, and management of the document readable by a computer, comprising:
-
summary readability improvement means for improving readability of a summary by distinguishing user known information already known to a user, and/or information known through an access log regarded as already known to a user based on a document previously presented to the user when a summary is generated, from other information than these two types of information, and by selecting an important portion in a document to be summarized; and
summary generation means for generating the summary of the document to be summarized based on a selection result from said summary readability improvement means.
-
Specification