Automatic method of generating feature probabilities for automatic extracting summarization
First Claim
1. A method of automatically generating feature probabilities from a document corpus, each document including a multiplicity of sentences, the method of comprising the steps of:
- a) designating as a selected document a document of the document corpus;
b) designating as a selected sentence a one of the sentences of the selected document;
c) determining a value of a location feature for the selected sentence, the location feature having a first location value, a second location value, and a third location value, the first location value indicating that the selected sentence is included within a beginning portion of the selected document, the second location value indicating that the selected sentence is included within a middle portion of the selected document, and the third location value indicating that the selected sentence is included within an ending portion of the selected document;
d) determining a value of an upper case feature for the selected sentence, the upper case feature having a first upper case value and a second upper case value, the first upper case value indicating that selected sentence does not include any of a multiplicity of selected upper case phrases, the selected upper case phrases forming a subset of upper case phrases included within the selected document, the second upper case value indicating the selected sentence includes a one of the selected upper case phrases;
e) incrementing a location counter associated with the value of the location feature for the selected sentence;
f) incrementing an upper case counter associated with the value of the upper case feature for the selected document;
g) if all sentences of the selected document have not been designated as the selected sentence, repeating steps b) through f);
h) if all documents of the document corpus have not been designated as the selected document, repeating steps a) through g);
i) determining probabilities for each value of the location feature using the associated counter for each location feature value;
j) determining the probabilities for each value of the upper case feature using the associated counter for each upper case feature value; and
k) generating an extract for a first document presented in machine readable form to the user using the upper case feature, the location feature and the probabilities for each value of the upper case feature and the location feature.
3 Assignments
0 Petitions
Accused Products
Abstract
A method of automatically generating feature probabilities that allow later automatic generation of document extracts. The computer system generates the probabilities by analyzing each document a document at a time. First, the computer system designates one of the documents as a selected document. Next, the computer system analyzes each sentence of the selected document to determine the value of the paragraph feature and the value of the uppercase feature. The computer system repeats this effort for each document of the document corpus. Afterward, the number of occurrences of each value of each feature is calculated and is used to calculate feature value probabilities for all of the features.
223 Citations
8 Claims
-
1. A method of automatically generating feature probabilities from a document corpus, each document including a multiplicity of sentences, the method of comprising the steps of:
-
a) designating as a selected document a document of the document corpus; b) designating as a selected sentence a one of the sentences of the selected document; c) determining a value of a location feature for the selected sentence, the location feature having a first location value, a second location value, and a third location value, the first location value indicating that the selected sentence is included within a beginning portion of the selected document, the second location value indicating that the selected sentence is included within a middle portion of the selected document, and the third location value indicating that the selected sentence is included within an ending portion of the selected document; d) determining a value of an upper case feature for the selected sentence, the upper case feature having a first upper case value and a second upper case value, the first upper case value indicating that selected sentence does not include any of a multiplicity of selected upper case phrases, the selected upper case phrases forming a subset of upper case phrases included within the selected document, the second upper case value indicating the selected sentence includes a one of the selected upper case phrases; e) incrementing a location counter associated with the value of the location feature for the selected sentence; f) incrementing an upper case counter associated with the value of the upper case feature for the selected document; g) if all sentences of the selected document have not been designated as the selected sentence, repeating steps b) through f); h) if all documents of the document corpus have not been designated as the selected document, repeating steps a) through g); i) determining probabilities for each value of the location feature using the associated counter for each location feature value; j) determining the probabilities for each value of the upper case feature using the associated counter for each upper case feature value; and k) generating an extract for a first document presented in machine readable form to the user using the upper case feature, the location feature and the probabilities for each value of the upper case feature and the location feature.
-
-
2. A method of automatically generating feature probabilities from a document corpus and a summary corpus of model summaries, each document of the document corpus being associated with a summary of the summary corpus, each document including a multiplicity of sentences, the multiplicity of sentences including a plurality of matching sentences, each matching sentence matching a sentence of the associated summary, the method of comprising the steps of:
-
a) designating as a selected document a document of the document corpus; b) designating as a selected sentence a one of the sentences of the selected document; c) determining values for the selected sentence of each feature of a feature set, the feature set including a location feature and an upper case feature, the location feature having a first location value, a second location value, and a third location value, the first location value indicating that the selected sentence is included within a beginning portion of the selected document, the second location value indicating that the selected sentence is included within a middle portion of the selected document, and the third location value indicating that the selected sentence is included within an ending portion of the selected document, each value of the location feature having an associated total counter, and an associated matching counter, the upper case feature having a first upper case value and a second upper case value, the first upper case value indicating that selected sentence does not include any of a multiplicity of selected upper case phrases, the second upper case value indicating the selected sentence includes a one of the selected upper case phrases, each value of the upper case feature having an associated total counter and an associated matching counter; d) for each feature incrementing the total counter associated with the feature value for the selected sentence; e) if the selected sentence is a one of the plurality of matching sentences, for each feature incrementing the matching counter associated with the feature value for the selected sentence; f) if all sentences of the selected document have not been designated as the selected sentence, repeating steps b) through e); g) if all documents of the document corpus have not been designated as the selected document, repeating steps a) through f); h) for each value of each feature determining a probability using the associated total counter and the associated matching counter; and i) generating an extract for a first document presented in machine readable form to the user using the feature set and the probabilities for each value of each feature. - View Dependent Claims (3, 4, 5, 6, 7)
-
-
8. An article of manufacture comprising:
-
a) a memory; and b) instructions stored in the memory, the instructions for automatically generating feature probabilities from a document corpus and a summary corpus of manually generated summaries, each document of the document corpus being associated with a summary of the summary corpus, each document including a multiplicity of sentences, the multiplicity of sentences including a plurality of matching sentences, each matching sentence matching a sentence of the associated summary, the instructions comprising the steps of; 1) designating as a selected document a document of the document corpus; 2) designating as a selected sentence a one of the sentences of the selected document; 3) determining values for the selected sentence of each feature of a feature set, the feature set including a location feature and an upper case feature, the location feature having a first location value, a second location value, and a third location value, the first location value indicating that the selected sentence is included within a beginning portion of the selected document, the second location value indicating that the selected sentence is included within a middle portion of the selected document, and the third location value indicating that the selected sentence is included with an ending portion of the selected document, each value of the location feature having an associated total counter, and an associated matching counter, the upper case feature having a first upper case value and a second upper case value, the first upper case value indicating that selected sentence does not include any of a multiplicity of selected upper case phrases, the second upper case value indicating the selected sentence includes a one of the selected upper case phrases, each value of the upper case feature having an associated total counter and an associated matching counter; 4) for each feature incrementing the total counter associated with the feature value for the selected sentence; 5) if the selected sentence is a one of the plurality of matching sentences, for each feature incrementing the matching counter associated with the feature value for the selected sentence; 6) if all sentences of the selected document have not been designated as the selected sentence, repeating steps b2) through b5); 7) if all documents of the document corpus have not been designated as the selected document, repeating steps b1) through b6); and 8) for each value of each feature determining a probability using the associated total counter and the associated matching counter.
-
Specification