Generating a summary based on readability
First Claim
Patent Images
1. A method executed by a computer system, comprising:
- extracting a set of sentences from a digital document;
scoring each sentence of the set of sentences using a respective informativeness measure;
scoring each sentence of the set of sentences using a readability measure, wherein the readability measure is based at least in part on one of;
a number of words in the sentence, a number of syllables per word, a frequency of a word based on a vocabulary frequency, a frequency of a word based on context, or if words of the sentence appear on a reading list;
selecting selected sentences in the set of sentences based on the readability measures and informativeness measures, wherein the selecting comprises;
determining a subset of sentences from the set of sentences, wherein the sentences in the subset of sentences have informativeness measures greater than a threshold, andselecting, from the subset of sentences, the selected sentences based on a ranking of the sentences in the subset of sentences according to readability measures of the sentences in the subset of sentences, wherein the selected sentences are the to ranked sentences in the subset of sentences;
identifying a low readability, high informativeness sentence from the set of sentences, wherein;
a low readability sentence includes at least one of fewer syllables per word, fewer words on a reading list, or a lower frequency of words associated with a vocabulary frequency list; and
a high informativeness sentence includes greater similarity to other sentences in the set of sentences and more words having term frequency-inverse document frequency (tf-idf) values indicating that the words are key words;
generating a concatenated sentence by concatenating at least one contextual sentence with the low readability, high informativeness sentence, wherein the concatenated sentence has a higher readability than the low readability, high informativeness sentence; and
generating a readable summary of the digital document, the readable summary including the concatenated sentence and the selected sentences.
8 Assignments
0 Petitions
Accused Products
Abstract
A technique to generate a summary of a set of sentences. Each sentence in the set can be evaluated based on a criterion, such as informativeness of the sentence. The sentences may also be evaluated for readability based on a readability measure. Sentences can be selected for inclusion in the summary based on the evaluations.
-
Citations
15 Claims
-
1. A method executed by a computer system, comprising:
-
extracting a set of sentences from a digital document; scoring each sentence of the set of sentences using a respective informativeness measure; scoring each sentence of the set of sentences using a readability measure, wherein the readability measure is based at least in part on one of;
a number of words in the sentence, a number of syllables per word, a frequency of a word based on a vocabulary frequency, a frequency of a word based on context, or if words of the sentence appear on a reading list;selecting selected sentences in the set of sentences based on the readability measures and informativeness measures, wherein the selecting comprises; determining a subset of sentences from the set of sentences, wherein the sentences in the subset of sentences have informativeness measures greater than a threshold, and selecting, from the subset of sentences, the selected sentences based on a ranking of the sentences in the subset of sentences according to readability measures of the sentences in the subset of sentences, wherein the selected sentences are the to ranked sentences in the subset of sentences; identifying a low readability, high informativeness sentence from the set of sentences, wherein; a low readability sentence includes at least one of fewer syllables per word, fewer words on a reading list, or a lower frequency of words associated with a vocabulary frequency list; and a high informativeness sentence includes greater similarity to other sentences in the set of sentences and more words having term frequency-inverse document frequency (tf-idf) values indicating that the words are key words; generating a concatenated sentence by concatenating at least one contextual sentence with the low readability, high informativeness sentence, wherein the concatenated sentence has a higher readability than the low readability, high informativeness sentence; and generating a readable summary of the digital document, the readable summary including the concatenated sentence and the selected sentences. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system comprising:
-
a processor; and a non-transitory storage medium storing instructions executable on the processor to; extract a plurality of sentences from a digital document; identify sentences from the plurality of sentences for inclusion in a summary of the digital document based on a criterion; evaluate a readability of the identified sentences using respective readability measures, wherein each readability measure assigned to each sentence is based at least in part on one of;
a number of words in the sentence, a number of syllables per word, a frequency of a word based on a vocabulary frequency, a frequency of a word based on context, or if words of the sentence appear on a reading list;select sentences based in part on the evaluated readability of the identified sentences, wherein the selecting comprises; determining a subset of sentences from the plurality of sentences, wherein the sentences in the subset of sentences have informativeness measures greater than a threshold, and selecting, from the subset of sentences, the selected sentences based on a ranking of the sentences in the subset of sentences according to readability measures of the sentences in the subset of sentences, wherein the selected sentences are the to ranked sentences in the subset of sentences; add a low readability, high informativeness sentence to at least one of the selected sentences to create a concatenated sentence, wherein the concatenated sentence has a higher readability than the low readability, high informativeness sentence, and wherein; a low readability sentence includes at least one of fewer syllables per word, fewer words on a reading list, or a lower frequency of words associated with a vocabulary frequency list; and a high informativeness sentence includes greater similarity to other sentences in the plurality of sentences and more words having term frequency-inverse document frequency (tf-idf) values indicating that the words are key words. - View Dependent Claims (8, 9, 10)
-
-
11. A non-transitory computer readable storage medium storing instructions that when executed cause a computer system to:
-
assign a respective informativeness measure to each sentence of a set of sentences in a digital document; assign a respective readability measure to each sentence of the set of sentences; select selected sentences in the set of sentences based on the readability measures and informativeness measures, wherein the selecting comprises; determining a subset of sentences from the set of sentences, wherein the sentences in the subset of sentences have informativeness measures greater than a threshold, and selecting, from the subset of sentences, the selected sentences based on a ranking of the sentences in the subset of sentences according to readability measures of the sentences in the subset of sentences, wherein the selected sentences are the top ranked sentences in the subset of sentences; identify a low readability, high informativeness sentence from the set of sentences, wherein; a low readability sentence includes at least one of fewer syllables per word, fewer words on a reading list, or a lower frequency of words associated with a vocabulary frequency list; and a high informativeness sentence includes greater similarity to other sentences in the set of sentences and more words having term frequency-inverse document frequency (tf-idf) values indicating that the words are key words; generate a concatenated sentence by concatenating at least one contextual sentence onto the low readability, high informativeness sentence, wherein the concatenated sentence has a higher readability than the low readability, high informativeness sentence; and generate a summary of the digital document by adding the selected sentences and the concatenated sentence to the summary. - View Dependent Claims (12, 13, 14, 15)
-
Specification