×

Automatic method of generating feature probabilities for automatic extracting summarization

  • US 5,778,397 A
  • Filed: 06/28/1995
  • Issued: 07/07/1998
  • Est. Priority Date: 06/28/1995
  • Status: Expired due to Term
First Claim
Patent Images

1. A method of automatically generating feature probabilities from a document corpus, each document including a multiplicity of sentences, the method of comprising the steps of:

  • a) designating as a selected document a document of the document corpus;

    b) designating as a selected sentence a one of the sentences of the selected document;

    c) determining a value of a location feature for the selected sentence, the location feature having a first location value, a second location value, and a third location value, the first location value indicating that the selected sentence is included within a beginning portion of the selected document, the second location value indicating that the selected sentence is included within a middle portion of the selected document, and the third location value indicating that the selected sentence is included within an ending portion of the selected document;

    d) determining a value of an upper case feature for the selected sentence, the upper case feature having a first upper case value and a second upper case value, the first upper case value indicating that selected sentence does not include any of a multiplicity of selected upper case phrases, the selected upper case phrases forming a subset of upper case phrases included within the selected document, the second upper case value indicating the selected sentence includes a one of the selected upper case phrases;

    e) incrementing a location counter associated with the value of the location feature for the selected sentence;

    f) incrementing an upper case counter associated with the value of the upper case feature for the selected document;

    g) if all sentences of the selected document have not been designated as the selected sentence, repeating steps b) through f);

    h) if all documents of the document corpus have not been designated as the selected document, repeating steps a) through g);

    i) determining probabilities for each value of the location feature using the associated counter for each location feature value;

    j) determining the probabilities for each value of the upper case feature using the associated counter for each upper case feature value; and

    k) generating an extract for a first document presented in machine readable form to the user using the upper case feature, the location feature and the probabilities for each value of the upper case feature and the location feature.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×