Creating a summary having sentences with the highest weight, and lowest length
First Claim
1. A method in a computer system for generating a summary of a document, the document having sentences, the sentences being ordered, the method comprising:
- providing a weight for each of a plurality of the sentences, the weight indicating importance of the sentence to the document;
identifying sets of the plurality of the sentences;
identifying a total length of the plurality of sentences in each identified set;
identifying a total of the provided weights of the plurality of sentences in each identified set;
selecting one of the identified sets of the plurality of the sentences, wherein the selected set has a greatest total of the provided weights of all identified sets having a total length less than a predefined length; and
creating a summary from the plurality of the sentences in the selected set.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for generating a summary of a document. The summary generating system generates the summary from the sentences that form the document. The summary generating system calculates a weight for each of the sentences in the document. The weight indicates the importance of the sentence to the document. The summary generating system then selects sentences based on their calculated weights. The summary generating system creates a summary of the selected sentences such that selected sentences are ordered in the created summary in the same relative order as in the document. In one embodiment, the summary generating system identifies sets of sentences whose total length of the sentences in the set is less than a maximum length. The summary generating system then selects an identified set of sentences whose total of the calculated weights of the sentences is greatest as the generated summary. The length of a sentence may be measured in characters or words. In an alternate embodiment, the summary generating system selects the sentences with the highest calculated weights whose total length of the selected sentences is less than a maximum length as the summary.
64 Citations
31 Claims
-
1. A method in a computer system for generating a summary of a document, the document having sentences, the sentences being ordered, the method comprising:
-
providing a weight for each of a plurality of the sentences, the weight indicating importance of the sentence to the document;
identifying sets of the plurality of the sentences;
identifying a total length of the plurality of sentences in each identified set;
identifying a total of the provided weights of the plurality of sentences in each identified set;
selecting one of the identified sets of the plurality of the sentences, wherein the selected set has a greatest total of the provided weights of all identified sets having a total length less than a predefined length; and
creating a summary from the plurality of the sentences in the selected set. - View Dependent Claims (2, 3, 4, 5, 9, 10, 12, 13, 26)
estimating a number of the plurality of documents that contain the sentence based on term frequencies of the component terms, a term frequency of a term being a number of occurrences of that term in a document;
estimating a total number of times the sentence occurs in the plurality of documents based on the term frequencies of the component terms; and
combining the estimated number of documents that contain the sentence and the estimated total number of times that the sentence occurs in the plurality of documents to generate the weight for the sentence.
-
-
13. The method of claim 1 wherein the document is one of a plurality of documents, wherein each document comprises terms, wherein each sentence comprises component terms, and wherein the providing of a weight of a sentence includes:
-
for each term, providing a term frequency that represents the number of occurrences of that term in the plurality of documents;
estimating a document frequency for the sentence based on an estimated sentence probability of the sentence, the document frequency being the number of the plurality of the documents that contain the sentence, the estimated sentence probability being an estimation of the probability that any sentence in documents that contain each component term is the sentence, the sentence probability being derived from term probabilities of the component terms, the term probability of a component term being a ratio of an average of the provided term frequencies for the component terms per document that contains that component term to an average number of terms per document;
estimating a total sentence frequency for the sentence based on an average sentence frequency for the sentence times the estimated document frequency for the sentence, the average sentence frequency being derived from the sentence probability of the sentence and the average number of terms per document; and
combining the estimated document frequency with the estimated total sentence frequency to generate the weight of the sentence.
-
-
26. A computer-readable medium having computer executable instructions stored theron for performing the method recited in claim 1.
-
6. A method in a computer system for generating a summary of a document, the document having sentences, the sentences being ordered, the method comprising:
-
providing a weight for each of a plurality of the sentences, the weight indicating importance of the sentence to the document;
identifying sets of the plurality of sentences;
identifying a total length for the plurality of sentences in each identified set;
identifying a total of the provided weights of the plurality of sentences in each identified set;
selecting one of the identified sets of the plurality of the sentences, wherein the selected set has a greatest total of the provided weights of all identified sets having a total length less than a predefined length; and
creating a summary from the plurality of the sentences in the selected set, wherein the plurality of sentences in the selected set are ordered in the created summary in the same relative order as the plurality of sentences in the selected set appear in the document. - View Dependent Claims (7, 8, 11, 27)
-
-
14. A method in a computer system for generating a summary of a document, the document having sentences, each sentence having a length, the method comprising:
-
providing a weight for each of a plurality of the sentences, the weight indicating importance of the sentence to the document;
identifying sets of the plurality of sentences, each identified set of the plurality of sentences having a total length less than a maximum length; and
selecting an identified set of the plurality of sentences, wherein the selected set has a greatest total of the provided weights of all identified sets of sentences;
wherein the document is one of a plurality of documents, wherein the sentence has component terms, and wherein the providing of the weights of a sentence includes;
estimating a number of the plurality of documents that contain the sentence based on term frequencies of the component terms, a term frequency of a term being a number of occurrences of that term in a document;
estimating a total number of times the sentence occurs in the plurality of documents based on the term frequencies of the component terms; and
combining the estimated number of documents that contain the sentence and the estimated total number of times that the sentence occurs in the plurality of documents to generate the weight for the sentence. - View Dependent Claims (28)
-
-
15. A method in a computer system for generating a summary of a document, the document having sentences, each sentence having a length, the method comprising:
-
providing a weight for each of a plurality of the sentences, the weight indicating importance of the sentence to the document;
identifying sets of the plurality of sentences, each identified set of the plurality of sentences having a total length less than a maximum length; and
selecting an identified set of the plurality of sentences, wherein the selected set has a greatest total of the provided weights of all identified sets of sentences, wherein the document is one of a plurality of documents, wherein each document comprises terms, wherein each sentence comprises component terms, and wherein the providing of a weight of a sentence includes;
for each term, providing a term frequency that represents the number of occurrences of that term in the plurality of documents;
estimating a document frequency for the sentence based on an estimated sentence probability of the sentence, the document frequency being the number of the plurality of the documents that contain the sentence, the estimated sentence probability being an estimation of the probability that any sentence in documents that contain each component term is the sentence, the sentence probability being derived from term probabilities of the component terms, the term probability of a component term being a ratio of an average of the provided term frequencies for the component terms per document that contains that component term to an average number of terms per document;
estimating a total sentence frequency for the sentence based on an average sentence frequency for the sentence times the estimated document frequency for the sentence, the average sentence frequency being derived from the sentence probability of the sentence and the average number of terms per document; and
combining the estimated document frequency with the estimated total sentence frequency to generate the weight of the sentence.
-
-
16. A method in a computer system for generating a summary of a document, the document having sentences, each sentence having a length, the method comprising:
-
providing a weight for each of a plurality of the sentences, the weight indicating importance of the sentence to the document;
identifying sets of the plurality of sentences, each identified set of the plurality of sentences having a total length less than a maximum length;
selecting an identified set of the plurality of sentences, wherein the selected set has a greatest total of the provided weights of all identified sets of sentences; and
creating a summary from the plurality of sentences in the selected set, wherein the document is one of a plurality of documents, wherein the sentence has component terms, and wherein the providing of the weights of a sentence includes;
estimating a number of the plurality of documents that contain the sentence based on term frequencies of the component terms, a term frequency of a term being a number of occurrences of that term in a document;
estimating a total number of times the sentence occurs in the plurality of documents based on the term frequencies of the component terms; and
combining the estimated number of documents that contain the sentence and the estimated total number of times that the sentence occurs in the plurality of documents to generate the weight for the sentence. - View Dependent Claims (29, 30)
-
-
17. A method in a computer system for generating a summary of a document, the document having sentences, each sentence having a length, the method comprising:
-
providing a weight for each of a plurality of the sentences, the weight indicating importance of the sentence to the document;
identifying sets of the plurality of sentences, each identified set of the plurality of sentences having a total length less than a maximum length;
selecting an identified set of the plurality of sentences, wherein the selected set has a greatest total of the provided weights of all identified sets of sentences; and
creating a summary from the plurality of sentences in the selected set, wherein the document is one of a plurality of documents, wherein each document comprises terms, wherein each sentence comprises component terms, and wherein the providing of a weight of a sentence includes;
for each term, providing a term frequency that represents the number of occurrences of that term in the plurality of documents;
estimating a document frequency for the sentence based on an estimated sentence probability of the sentence, the document frequency being the number of the plurality of the documents that contain the sentence, the estimated sentence probability being an estimation of the probability that any sentence in documents that contain each component term is the sentence, the sentence probability being derived from term probabilities of the component terms, the term probability of a component term being a ratio of an average of the provided term frequencies for the component terms per document that contains the component term to an average number of terms per document;
estimating a total sentence frequency for the sentence based on an average sentence frequency for the sentence times the estimated document frequency for the sentence, the average sentence frequency being derived from the sentence probability of the sentence and the average number of terms per document; and
combining the estimated document frequency with the estimated total sentence frequency to generate the weight of the sentence. - View Dependent Claims (31)
-
-
18. A computer system for generating a summary of a document, the document having sentences, each sentence having a length, comprising:
-
a component for calculating a weight for each of a plurality of the sentences;
a component for identifying sets of the plurality of the sentences;
a component for identifying a total length of the plurality of sentences in each identified set;
a component for identifying a total of the provided weights of the plurality of sentences in each identified set;
a component for selecting one of the identified sets of the plurality of the sentences, wherein the selected set has a greatest total of the provided weights of all identified sets having a total length less than a predefined length; and
a component for creating a summary from the plurality of the sentences in the selected set. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25)
-
Specification