Harvesting Data From Page
First Claim
1. A computer-implemented method for obtaining data from a page, the method comprising:
- initiating a harvesting process for a page available in a computer system;
identifying a feed representation that has been created for the page; and
retrieving and storing, as part of the harvesting process, at least a portion from the page based on information in the identified feed representation.
6 Assignments
0 Petitions
Accused Products
Abstract
Among other disclosure, computer-implemented methods and computer program products for obtaining data from a page. A method can include initiating a harvesting process for a page available in a computer system. The method can include identifying a feed representation that has been created for the page. The method can include retrieving and storing, as part of the harvesting process, at least a portion from the page based on information in the identified feed representation. The feed representation can include at least excerpts of content from the page. The feed representation can include at least one representation selected from: an RSS feed, an Atom feed, an XML feed, an RDF feed, a serialized data feed representation, and combinations thereof.
-
Citations
19 Claims
-
1. A computer-implemented method for obtaining data from a page, the method comprising:
-
initiating a harvesting process for a page available in a computer system; identifying a feed representation that has been created for the page; and retrieving and storing, as part of the harvesting process, at least a portion from the page based on information in the identified feed representation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer program product tangibly embodied in a computer-readable medium and comprising instructions that when executed by a processor perform a method for obtaining data from a page, the method comprising:
-
initiating a harvesting process for a page available in a computer system; identifying a feed representation that has been created for the page; and retrieving and storing, as part of the harvesting process, at least a portion from the page based on information in the identified feed representation.
-
-
12. A computer-implemented method for obtaining data from a page, the method comprising:
-
identifying a page as a target for content retrieval, the page including multiple content portions; identifying a feed representation that has been created for the identified page, the identified feed representation including multiple feed entries each corresponding to at least some of the multiple content portions; processing each of the multiple feed entries by; accessing the identified page; identifying any of the multiple content portions that match contents of the feed entry being processed; and retrieving at least one of the multiple content portions based on the identified content portion; and storing, as a result of the content retrieval, each retrieved content portion obtained from the processing of the multiple feed entries. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
-
19. A computer program product tangibly embodied in a computer-readable medium and comprising instructions that when executed by a processor perform a method for obtaining data from a page, the method comprising:
-
identifying a page as a target for content retrieval, the page including multiple content portions; identifying a feed representation that that has been created for the identified page, the identified feed representation including multiple feed entries each corresponding to at least some of the multiple content portions; processing each of the multiple feed entries by; accessing the identified page; identifying any of the multiple content portions that match contents of the feed entry being processed; and retrieving at least one of the multiple content portions based on the identified content portion; and storing, as a result of the content retrieval, each retrieved content portion obtained from the processing of the multiple feed entries.
-
Specification