Automatic method of generating thematic summaries
First Claim
1. A processor implemented method of generating a thematic summary of a document presented in machine readable form to the processor, the document including a first multiplicity of sentences and a second multiplicity of terms, the processor implementing the method by executing instructions stored in electronic form in a memory device coupled to the processor, the processor implemented method comprising the steps of:
- a) determining a value of a first number of thematic terms based upon a value of a second number representing a length of the thematic summary, the first number being less than the second number;
b) selecting the first number of thematic terms from the second multiplicity of terms;
c) scoring each sentence of the first multiplicity of sentences based upon occurrence of thematic terms in each sentence; and
d) selecting the second number of thematic sentences from the first multiplicity of sentences based upon the score of each sentence.
4 Assignments
0 Petitions
Accused Products
Abstract
A technique for automatically generating thematic summaries for machine readable representations of documents. The technique begins with determining the number of thematic terms to be used based upon the number of thematic sentence to be extracted. To insure some commonality of theme between extracted sentences, the number of thematic terms used should be less than the number of thematic sentences to be extracted. Having determined the appropriate number of thematic terms, next the method identifies the thematic terms within the document. Afterward, each sentence of the document is scored based upon the number of thematic terms contained within the sentence. The desired number of highest scoring sentences are selected as thematic sentences.
109 Citations
10 Claims
-
1. A processor implemented method of generating a thematic summary of a document presented in machine readable form to the processor, the document including a first multiplicity of sentences and a second multiplicity of terms, the processor implementing the method by executing instructions stored in electronic form in a memory device coupled to the processor, the processor implemented method comprising the steps of:
-
a) determining a value of a first number of thematic terms based upon a value of a second number representing a length of the thematic summary, the first number being less than the second number; b) selecting the first number of thematic terms from the second multiplicity of terms; c) scoring each sentence of the first multiplicity of sentences based upon occurrence of thematic terms in each sentence; and d) selecting the second number of thematic sentences from the first multiplicity of sentences based upon the score of each sentence. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A processor implemented method of generating a thematic summary of a document presented in machine readable form to the processor, the document including a first multiplicity of sentences and a second multiplicity of terms, the processor implementing the method by executing instructions stored electronically in a memory device coupled to the processor, the processor implemented method comprising the steps of:
-
a) determining a value of a first number of thematic terms based upon a value of a second number representing a length of the thematic summary, the first number being less than the second number; b) determining a number of times each term of the second multiplicity of terms occurs in the document, c) selecting the first number of thematic terms from the second multiplicity of terms based upon the number of times each term occurs in the document; d) scoring each sentence of the first multiplicity of sentences based upon occurrences of thematic terms within each sentence; and e) selecting the second number of thematic sentences from the first multiplicity of sentences based upon the sentence scores. - View Dependent Claims (7, 8, 9, 10)
-
Specification