×

Method for automatically extracting by-line information

  • US 7,464,078 B2
  • Filed: 10/25/2005
  • Issued: 12/09/2008
  • Est. Priority Date: 10/25/2005
  • Status: Expired due to Fees
First Claim
Patent Images

1. A processor-implemented method of automatically extracting by-line information in a crawled document, wherein said document contains a single news article, comprising:

  • inputting the single news article wherein the single news article comprises a single title meta-tag;

    removing formatting tags from the single news article to create a de-tagged version of the single news article;

    detecting a set of potential headlines of the single news article from among the substrings of the single title meta-tag of the single news article and their bi-grams and n-grams,the detecting further comprising constructing the set of potential headlines based on the title meta-tag and splitting the title meta-tag at all punctuation marks in the title meta-tag, resulting in a set of sub-strings of the title meta-tag;

    adding any of a plurality of bi-grams of the sub-strings and a plurality of n-grams of the sub-strings to the set of potential headlines;

    selecting a candidate headline from the set of potential headlines; and

    extracting the by-line information from the de-tagged version of the single news article using the location of the selected candidate headline.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×