×

Automatic correlation method for generating summaries for text documents

  • US 7,017,114 B2
  • Filed: 08/31/2001
  • Issued: 03/21/2006
  • Est. Priority Date: 09/20/2000
  • Status: Expired due to Fees
First Claim
Patent Images

1. An automatic method for generating summaries for text documents, comprising steps of:

  • generating a set of sentences for a set of documents by document discourse analysis and a set of words by morphologic process;

    initializing a word score for each word in the set of words, a sentence score for each sentence in the set of sentences and a score sum;

    computing an aggregated word score for said each word according to an aggregate of sentence scores of sentences containing said each word and to a degree of correlation between said each word and user related information;

    wherein said aggregated word score (SCORE[w]) has a weighted (λ

    ) relationship with each of said aggregated sentence score (SCORE[s]), linguistic salience of said each word to a user profile (salience(w, user summarization profile)), similarities among said each word, a query and a provided topic (salience(w, user'"'"'s query or topic)), similarities among said each word and terms in titles of the documents (salience(w, tile words)), a ratio of an occurrence number for said each word in a document to a total occurrence number for said each word in the set of documents (FREQUENCY(w/d)/FREQUENCY(w/D)), and a ratio of a number of documents including said each word to a total number of documents in the set of documents (NUMBER(d, dcustom characterw)/NUMBER(D)), of the form
    SCORE[w]=λ

    1*salience(w, user summarization profile)+λ

    2*salience(w, user'"'"'s query or topic)+λ

    3

    (SCORE[s], scustom characterω

    )+λ

    4*salience(w, title words)+λ

    5*FREQUENCY(w/d)/FREQUENCY(w/D)+λ

    6*NUMBER(d, dcustom characterw)/NUMBER(D);

    computing an aggregated sentence score for said each sentence according to an aggregate of word scores composing said each sentence and a respective sentence position in a section and a paragraph;

    comparing an aggregate sum with said score sum, said aggregate sum being a sum of aggregated word scores and aggregated sentence scores; and

    if said aggregate sum is different than said score sum, returning to the step of computing the aggregated word scare;

    otherwise,outputting top-ranked sentences according to sentence score as a summary of the set of documents, top-ranked words according to word score as a keywords list of the set of documents.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×