×

Extracting structured data from weblogs

  • US 10,180,986 B2
  • Filed: 10/12/2015
  • Issued: 01/15/2019
  • Est. Priority Date: 06/16/2005
  • Status: Active Grant
First Claim
Patent Images

1. A method of extracting weblog posts from a weblog, the method comprising:

  • retrieving a feed referenced on a webpage of the weblog; and

    in response to determining that the feed does not contain a first portion of a weblog post;

    creating, via a processor, a representation of the weblog post based on a second portion of the weblog post included in the feed;

    filtering the representation of the weblog post to summarization artefacts;

    searching, via the processor, the weblog for the filtered representation of the second portion of the weblog post;

    when the second portion of the weblog post is found in the weblog, identifying, via the processor, a node associated with the second portion in the webpage;

    extracting, via the processor, information from markup language contained within the node associated with the second portion of the webpage; and

    modifying, via the processor, the representation based on the information extracted from within the node to reconstruct the weblog post.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×