Method for producing a document summary
First Claim
1. A method for producing a document summary from a document, said document including a plurality of words and being segmentable into a plurality of text segments, each text segment including at least one word, said document being classifiable as belonging to a category selected from a set of predetermined categories and each text segment being classifiable as belonging to a theme selected from a set of predetermined themes, said method comprising:
- associating with said document a specific category from said set of predetermined categories;
performing a thematic segmentation of said document to produce a segmented document, said segmented document including said plurality of text segments;
associating with each text segment from said plurality of text segments a theme selected from said set of predetermined themes; and
summarizing said segmented document to produce said document summary by processing each text segment from said plurality of text segments to eitherselect at least one summary textual unit from said text segment, said at least on summary textual unit including at least one of said word, said at least one summary textual unit being a textual unit considered important in summarizing said document;
orextract no textual unit from said text segment;
said summary textual units being used to form said document summary;
wherein said thematic segmentation is dependent on said category to which said document is associated and said summary textual units are selected for each text segment depending on said theme with which said text segment is associated.
0 Assignments
0 Petitions
Accused Products
Abstract
A method for producing a document summary from a document. The method includes:
associating with the document a specific category from a set of predetermined categories;
performing a thematic segmentation of the document to produce a segmented document, the segmented document including a plurality of text segments;
associating with each text segment from the plurality of text segments a theme selected from a set of predetermined themes; and
summarizing the segmented document to produce the document summary by processing each text segment from the plurality of text segments to either
- select at least one summary textual unit from the text segment, the at least one summary textual unit including at least one word and being a textual unit considered important in summarizing the document; or
- extract no textual unit from the text segment.
The summary textual units are used to form the document summary. The thematic segmentation is dependent on the category to which the document is associated and the summary textual units are selected for each text segment depending on the theme with which the text segment is associated.
-
Citations
23 Claims
-
1. A method for producing a document summary from a document, said document including a plurality of words and being segmentable into a plurality of text segments, each text segment including at least one word, said document being classifiable as belonging to a category selected from a set of predetermined categories and each text segment being classifiable as belonging to a theme selected from a set of predetermined themes, said method comprising:
-
associating with said document a specific category from said set of predetermined categories; performing a thematic segmentation of said document to produce a segmented document, said segmented document including said plurality of text segments; associating with each text segment from said plurality of text segments a theme selected from said set of predetermined themes; and summarizing said segmented document to produce said document summary by processing each text segment from said plurality of text segments to either select at least one summary textual unit from said text segment, said at least on summary textual unit including at least one of said word, said at least one summary textual unit being a textual unit considered important in summarizing said document;
orextract no textual unit from said text segment; said summary textual units being used to form said document summary; wherein said thematic segmentation is dependent on said category to which said document is associated and said summary textual units are selected for each text segment depending on said theme with which said text segment is associated. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A computer readable storage medium containing a program element for execution by a computing device, said program element being able to produce a document summary from a document, said document including a plurality of words and being segmentable into a plurality of text segments, each text segment including at least one word, said document being classifiable as belonging to a category selected from a set of predetermined categories and each text segment being classifiable as belonging to a theme selected from a set of predetermined themes, said program element comprising:
-
an input module operative for receiving the document; a categorization module operative for associating with said document a specific category from said set of predetermined categories; a segmentation module operative for performing a thematic segmentation of said document to produce a segmented document, said segmented document including said plurality of text segments; and associating with each text segment from said plurality of text segments a theme selected from said set of predetermined themes; a summarization module operative for summarizing said segmented document to produce said document summary by processing each text segment from said plurality of text segments to either select at least one summary textual unit from said text segment, said at least on summary textual unit including at least one of said word, said at least one summary textual unit being a textual unit considered important in summarizing said document;
orextract no textual unit from said text segment; said summary textual units being used to form said document summary; and an output module operative for releasing the summarized document; wherein said thematic segmentation is dependent on said category to which said document is associated and said summary textual units are selected for each text segment depending on said theme with which said text segment is associated.
-
Specification