Extracting structured data from weblogs
First Claim
Patent Images
1. A method of extracting individual posts from a weblog, comprising:
- accessing a home page of the weblog;
identifying at least one feed associated with the weblog;
determining whether the at least one feed contains sufficient content for feed-guided segmentation;
if the at least one feed contains sufficient content for feed-guided segmentation, determining whether the at least one feed contains full content or partial content of the weblog;
if the at least one feed contains full content of the weblog, mapping data found in the at least one feed into a representation for weblog posts; and
if the at least one feed contains partial content of the weblog, screen scraping the weblog into a representation for weblog posts using the data.
4 Assignments
0 Petitions
Accused Products
Abstract
A method of extracting individual posts from a weblog comprises the steps of: (a) providing a feed associated with the weblog; and (b) screen scraping the weblog into a representation for weblog posts using the feed data containing partial content of the weblog.
192 Citations
27 Claims
-
1. A method of extracting individual posts from a weblog, comprising:
-
accessing a home page of the weblog; identifying at least one feed associated with the weblog; determining whether the at least one feed contains sufficient content for feed-guided segmentation; if the at least one feed contains sufficient content for feed-guided segmentation, determining whether the at least one feed contains full content or partial content of the weblog; if the at least one feed contains full content of the weblog, mapping data found in the at least one feed into a representation for weblog posts; and if the at least one feed contains partial content of the weblog, screen scraping the weblog into a representation for weblog posts using the data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
Specification