Method and system for summarizing a document
First Claim
1. A method in a computer system for identifying significant sentences of a document, the method comprising:
- training a classifier to classify words as important or not important to a document based on multiple scores for a word generated from different scoring techniques, a score indicating a technique-specific importance of a word to the document;
generating scores from the different scoring techniques for words of the document;
identifying important words of the document using the trained classifier to classify from the generated scores those words that are important to the document; and
calculating significance of sentences of the document based on the identified important words contained in the sentences.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for calculating the significance of a sentence within a document is provided. The summarization system calculates the significance of the sentences of a document and selects the most significant sentences as the summary of the document. The summarization system calculates the significance of a sentence based on the “important” words of the document that are contained within the sentence. The summarization system calculates the importance of words of the document using various scoring techniques and then combines the scores to classify a word as important or not important. The summarization system can then be used to identify significant sentences of the document based on the important words that a sentence contains and select significant sentences as a summary of the document.
38 Citations
39 Claims
-
1. A method in a computer system for identifying significant sentences of a document, the method comprising:
-
training a classifier to classify words as important or not important to a document based on multiple scores for a word generated from different scoring techniques, a score indicating a technique-specific importance of a word to the document;
generating scores from the different scoring techniques for words of the document;
identifying important words of the document using the trained classifier to classify from the generated scores those words that are important to the document; and
calculating significance of sentences of the document based on the identified important words contained in the sentences. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer-readable medium containing instructions for controlling a computer system to train a classifier to classify words of a document based on their importance to the document, by a method comprising:
-
for training documents, generating scores using different scoring techniques indicating a technique-specific importance of words to the training document; and
designating importance of words of the training document to the training document; and
training the classifier based on the generated scores for a word along with the designation of the importance of the word. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A computer-readable medium containing instructions for controlling a computer system to identify significant sentences of a document, by a method comprising:
-
generating scores from different scoring techniques for words of the document;
identifying important words of the document using a classifier to classify from the generated scores those words that are important to the document; and
calculating significance of sentences of the document based on the identified important words contained in the sentences. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
-
-
36. A computer system for calculating significance of a sentence of a document, comprising:
-
a classifier of words based on importance, wherein the classifier inputs, for a sentence that contains a word, scores generated from different sentence scoring techniques and outputs a classification of the word based on importance;
a component that for sentences of the document generates a score of significance of the sentence to the document for different sentence scoring techniques;
a component that classifies a word of the document as important using the classifier inputting the generated scores of a sentence that contains the word; and
a component that applies a scoring technique to a sentence to calculate significance of the sentence to the document based on the words classified as important that are contained in the sentence. - View Dependent Claims (37)
-
-
38. A computer-readable medium containing instructions for controlling a computer system to identify important words of a document, by a method comprising:
-
providing a classifier that, based on a feature vector representing multiple scores of importance of a word represented by the feature vector, indicates whether the word is important or not important, each score representing a different technique used to score the importance of a word;
generating a feature vector to represent a word of a document; and
classifying the word by applying the classifier to the generated feature vector.
-
-
39. A computer-readable medium containing instructions for controlling a computer system to generate a classifier to classify a word of a document based on its importance to the document, by a method comprising:
-
generating feature vector and importance indicator pairs for words of a training document, features of the feature vector representing different techniques for scoring importance of a word; and
training the classifier based on the generated feature vector and importance indicator pairs.
-
Specification