Extraction of key sections from texts using automatic indexing techniques
First Claim
1. A method for automatically condensing a document containing noun-phrases, to produce a synopsis of the document comprising the steps of:
- automatically extracting from said document a list of noun-phrases appearing in said document;
assigning a weight to each noun-phrase occurring in said document noun-phrase list;
storing said document noun-phrase list including said corresponding weights in a memory;
dividing said document by using user input into a plurality of identifiable document-sections;
comparing words in each one of said plurality of identifiable document-sections with said document noun-phrase list;
providing a count associated with each of said plurality of identifiable document-sections;
ranking each one of said plurality of identifiable document-sections in a descending order by said count;
storing said ranks in said memory;
providing as output, using said ranks, a first n number of identifiable document-sections from said ranks where n is a predetermined number; and
producing a synopsis of said document wherein said number of identified document-sections in the synopsis are in a sequence unchanged from how they existed in the document.
4 Assignments
0 Petitions
Accused Products
Abstract
A document condensation method and apparatus produce a document synopsis are provided in which automatic indexing techniques are used to analyze an input document to determine a list of words and phrases characteristic of the subject matter of the document. Sections of the document are compared to the list of characteristic words and phrases to determine which sections of the document are most like the overall document in view of subject matter. A predetermined number of sections determined to be most similar to the overall document in content are provided as a condensed version of the whole document.
-
Citations
14 Claims
-
1. A method for automatically condensing a document containing noun-phrases, to produce a synopsis of the document comprising the steps of:
-
automatically extracting from said document a list of noun-phrases appearing in said document; assigning a weight to each noun-phrase occurring in said document noun-phrase list; storing said document noun-phrase list including said corresponding weights in a memory; dividing said document by using user input into a plurality of identifiable document-sections; comparing words in each one of said plurality of identifiable document-sections with said document noun-phrase list; providing a count associated with each of said plurality of identifiable document-sections; ranking each one of said plurality of identifiable document-sections in a descending order by said count; storing said ranks in said memory; providing as output, using said ranks, a first n number of identifiable document-sections from said ranks where n is a predetermined number; and producing a synopsis of said document wherein said number of identified document-sections in the synopsis are in a sequence unchanged from how they existed in the document. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An apparatus for automatically condensing a document which includes noun-phrases, to produce a synopsis of the document, said apparatus comprising:
-
means for automatically extracting from said document a list of noun-phrases appearing in said document; means for assigning a weight to each noun-phrase occurring in said document noun-phrase list; means for storing said document noun-phrase list including said corresponding weights in a memory; means for dividing said document by using user-input, into a plurality of identifiable document-sections; means for comparing words in each one of said plurality of identifiable document-sections with said document noun-phrase list; means for providing a count associated with each of said plurality of identifiable document-sections; means for ranking each one of said plurality of identifiable document-sections in a descending order by said count; means for storing said ranks in said memory; and means for providing as output using said ranks, a first n number of identifiable document-sections from said ranks where n is a predetermined number, to produce a document synopsis wherein said n number of identifiable document-sections are in a sequence unchanged from how they existed in the document. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
Specification