×

Generating a summary based on readability

  • US 9,727,641 B2
  • Filed: 04/25/2013
  • Issued: 08/08/2017
  • Est. Priority Date: 04/25/2013
  • Status: Active Grant
First Claim
Patent Images

1. A method executed by a computer system, comprising:

  • extracting a set of sentences from a digital document;

    scoring each sentence of the set of sentences using a respective informativeness measure;

    scoring each sentence of the set of sentences using a readability measure, wherein the readability measure is based at least in part on one of;

    a number of words in the sentence, a number of syllables per word, a frequency of a word based on a vocabulary frequency, a frequency of a word based on context, or if words of the sentence appear on a reading list;

    selecting selected sentences in the set of sentences based on the readability measures and informativeness measures, wherein the selecting comprises;

    determining a subset of sentences from the set of sentences, wherein the sentences in the subset of sentences have informativeness measures greater than a threshold, andselecting, from the subset of sentences, the selected sentences based on a ranking of the sentences in the subset of sentences according to readability measures of the sentences in the subset of sentences, wherein the selected sentences are the to ranked sentences in the subset of sentences;

    identifying a low readability, high informativeness sentence from the set of sentences, wherein;

    a low readability sentence includes at least one of fewer syllables per word, fewer words on a reading list, or a lower frequency of words associated with a vocabulary frequency list; and

    a high informativeness sentence includes greater similarity to other sentences in the set of sentences and more words having term frequency-inverse document frequency (tf-idf) values indicating that the words are key words;

    generating a concatenated sentence by concatenating at least one contextual sentence with the low readability, high informativeness sentence, wherein the concatenated sentence has a higher readability than the low readability, high informativeness sentence; and

    generating a readable summary of the digital document, the readable summary including the concatenated sentence and the selected sentences.

View all claims
  • 8 Assignments
Timeline View
Assignment View
    ×
    ×