×

Method and system for classifying display pages using summaries

  • US 7,392,474 B2
  • Filed: 04/30/2004
  • Issued: 06/24/2008
  • Est. Priority Date: 04/30/2004
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method in a computer system for classifying web pages, the method comprising:

  • retrieving a web page;

    automatically generating a summary of the retrieved web page byidentifying objects of the web page, the objects having sentences;

    building a term frequency by inverted document frequency index for each object;

    calculating similarity between pairs of objects based on the term frequency by inverted document frequency indexes of the objects;

    when the calculated similarity between a pair of objects satisfies a similarity threshold, linking the pair objects to indicate that the objects satisfy the threshold;

    selecting as a core object of the web page the object that has the most links;

    assigning high scores to sentences of the core object and to objects with links to the core object and low scores to all other sentences;

    selecting sentences to form the summary of the web page based on the assigned scores; and

    determining a classification for the retrieved web page based on the automatically generated summary.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×