×

Multilingual sentence extractor

  • US 8,594,998 B2
  • Filed: 07/25/2011
  • Issued: 11/26/2013
  • Est. Priority Date: 07/30/2010
  • Status: Active Grant
First Claim
Patent Images

1. A multilingual method for summarizing an article, which comprises:

  • an offline stage which comprises;

    (a.) predefining a set of metrics;

    (b.) providing a collection of documents, and providing one or more of expert summaries for each article;

    (c.) indexing the sentences and words within each article;

    (d.) subjecting, using at least one computer processor, each sentence in said articles serially to the entire set of metrics, thereby obtaining for each sentence a plurality of sentence metrics values, each relating to one of said metrics respectively;

    (e.) guessing a population of u normalized weights vectors;

    (f.) for a selected weights vector ui in said population;

    (f.1.) for each sentence, calculating a sentence combined value, said combined value being a linear combination of the weights vector and said sentence metrics values;

    (f.2.) for each document ranking the sentences according to their combined values;

    (f.3.) for each document selecting a group of sentences having highest combined values;

    (f.4.) for each document, comparing said selected group with the one or more expert summaries, and obtaining a quality score for each article, selected group, and corresponding weights vector;

    (f.5.) repeating steps f1 to f4 for all the weights vectors in u;

    (g.) based on said quality scores, calculating a total score, and checking for convergence of a total of said quality scores in respect to previous iterations;

    (h.) upon convergence of said total quality scores, selecting a best weights vector which provides highest quality scores, and terminating the process;

    (i.) otherwise, if no convergence has yet obtained, selecting a group a of weights vectors out of population u that have provided highest quality scores, and by means of a genetic algorithm generating a new population u′

    of weights vectors, and repeating steps f to h with the population u′

    of weights vectors until convergence; and

    a real time stage which comprises;

    (j.) indexing sentences, and words within the document which needs summarization;

    (k.) calculating each of said predefined metrics of step a with respect to each of the sentences in the document to obtain sentence metric values;

    (l.) separately for each sentence, subjecting the sentence matrix values to the best weights vector as selected in step h of the offline stage, and summing up all weighted values to obtain a single combined value for each sentence;

    (m.) ranking the sentences according to their combined values to form a ranked list, and extracting a predetermined number of sentences from a top of the ranked list of sentences; and

    (n.) combining said extracted sentences thereby forming the document summary.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×