Multilingual sentence extractor
First Claim
1. A multilingual method for summarizing an article, which comprises:
- an offline stage which comprises;
(a.) predefining a set of metrics;
(b.) providing a collection of documents, and providing one or more of expert summaries for each article;
(c.) indexing the sentences and words within each article;
(d.) subjecting, using at least one computer processor, each sentence in said articles serially to the entire set of metrics, thereby obtaining for each sentence a plurality of sentence metrics values, each relating to one of said metrics respectively;
(e.) guessing a population of u normalized weights vectors;
(f.) for a selected weights vector ui in said population;
(f.1.) for each sentence, calculating a sentence combined value, said combined value being a linear combination of the weights vector and said sentence metrics values;
(f.2.) for each document ranking the sentences according to their combined values;
(f.3.) for each document selecting a group of sentences having highest combined values;
(f.4.) for each document, comparing said selected group with the one or more expert summaries, and obtaining a quality score for each article, selected group, and corresponding weights vector;
(f.5.) repeating steps f1 to f4 for all the weights vectors in u;
(g.) based on said quality scores, calculating a total score, and checking for convergence of a total of said quality scores in respect to previous iterations;
(h.) upon convergence of said total quality scores, selecting a best weights vector which provides highest quality scores, and terminating the process;
(i.) otherwise, if no convergence has yet obtained, selecting a group a of weights vectors out of population u that have provided highest quality scores, and by means of a genetic algorithm generating a new population u′
of weights vectors, and repeating steps f to h with the population u′
of weights vectors until convergence; and
a real time stage which comprises;
(j.) indexing sentences, and words within the document which needs summarization;
(k.) calculating each of said predefined metrics of step a with respect to each of the sentences in the document to obtain sentence metric values;
(l.) separately for each sentence, subjecting the sentence matrix values to the best weights vector as selected in step h of the offline stage, and summing up all weighted values to obtain a single combined value for each sentence;
(m.) ranking the sentences according to their combined values to form a ranked list, and extracting a predetermined number of sentences from a top of the ranked list of sentences; and
(n.) combining said extracted sentences thereby forming the document summary.
1 Assignment
0 Petitions
Accused Products
Abstract
The invention relates to a multilingual method for summarizing an article, which comprises an offline stage in which a weights vector is determined using, among others, plurality of predefined metrics, a collection of documents and expert prepared summaries, subjection of all the document sentences to all said metrics, guess of a population of weights matrices, subjection of the population to said metrics, ranking of sentences, generation of a new population using a genetic algorithm, and repetition of the same until convergence. The invention further comprises a real time stage in which the weights vector, as determined, as well as said metrics are used to determine an extract of any new document.
-
Citations
5 Claims
-
1. A multilingual method for summarizing an article, which comprises:
-
an offline stage which comprises; (a.) predefining a set of metrics; (b.) providing a collection of documents, and providing one or more of expert summaries for each article; (c.) indexing the sentences and words within each article; (d.) subjecting, using at least one computer processor, each sentence in said articles serially to the entire set of metrics, thereby obtaining for each sentence a plurality of sentence metrics values, each relating to one of said metrics respectively; (e.) guessing a population of u normalized weights vectors; (f.) for a selected weights vector ui in said population; (f.1.) for each sentence, calculating a sentence combined value, said combined value being a linear combination of the weights vector and said sentence metrics values; (f.2.) for each document ranking the sentences according to their combined values; (f.3.) for each document selecting a group of sentences having highest combined values; (f.4.) for each document, comparing said selected group with the one or more expert summaries, and obtaining a quality score for each article, selected group, and corresponding weights vector; (f.5.) repeating steps f1 to f4 for all the weights vectors in u; (g.) based on said quality scores, calculating a total score, and checking for convergence of a total of said quality scores in respect to previous iterations; (h.) upon convergence of said total quality scores, selecting a best weights vector which provides highest quality scores, and terminating the process; (i.) otherwise, if no convergence has yet obtained, selecting a group a of weights vectors out of population u that have provided highest quality scores, and by means of a genetic algorithm generating a new population u′
of weights vectors, and repeating steps f to h with the population u′
of weights vectors until convergence; anda real time stage which comprises; (j.) indexing sentences, and words within the document which needs summarization; (k.) calculating each of said predefined metrics of step a with respect to each of the sentences in the document to obtain sentence metric values; (l.) separately for each sentence, subjecting the sentence matrix values to the best weights vector as selected in step h of the offline stage, and summing up all weighted values to obtain a single combined value for each sentence; (m.) ranking the sentences according to their combined values to form a ranked list, and extracting a predetermined number of sentences from a top of the ranked list of sentences; and (n.) combining said extracted sentences thereby forming the document summary. - View Dependent Claims (2, 3, 4, 5)
-
Specification