×

SYSTEM FOR AUTOMATICALLY EXTRACTING BY-LINE INFORMATION

  • US 20080306941A1
  • Filed: 08/15/2008
  • Published: 12/11/2008
  • Est. Priority Date: 10/25/2005
  • Status: Active Grant
First Claim
Patent Images

1. A processor-implemented system for automatically extracting by-line information in a crawled document, wherein said document contains a single news article, comprising:

  • a detagging module for removing formatting tags from said crawled document to create a de-tagged version of said crawled document;

    a headline detection module for detecting a set of potential headlines of the document from a title meta-tag of the crawled document;

    a headline evaluation module for selecting a candidate headline from the set of potential headlines; and

    a by-line extraction module for extracting the by-line information from the de-tagged version of said crawled document using the location of the selected candidate headline.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×