System and method for extracting content elements from multiple Internet sources
First Claim
1. A system for automatically extracting data from a plurality of electronic documents, each electronic document being accessible over a computer network, the system comprising at least one micro-processor based device configured to:
- access through the network a first electronic document using first specifications;
receive criteria for extracting a first set of content elements of the first electronic document;
extract the first set of content elements based on the criteria;
access through the network a second electronic document using second specifications, the second specifications varying from the first specifications;
extract a second set of content elements based on the criteria; and
store the first and second set of content elements in a database.
4 Assignments
0 Petitions
Accused Products
Abstract
A system for automatically extracting data from at least one electronic document accessible through the Internet or other computer network. The system records a sequence of actions operable to electronically navigate to a target page of the electronic document, the target page including a plurality of elements each having contents and a structural definition wherein the structural definitions interrelate the plurality of elements to specify a target pattern for a select subset of the plurality of elements. After recording the navigation path and the target pattern, the system automatically accesses the target page according to the recorded sequence. When the target page is accessed, the system automatically identifies, copies and processes selections from the plurality of elements dependent upon the target pattern.
43 Citations
21 Claims
-
1. A system for automatically extracting data from a plurality of electronic documents, each electronic document being accessible over a computer network, the system comprising at least one micro-processor based device configured to:
-
access through the network a first electronic document using first specifications; receive criteria for extracting a first set of content elements of the first electronic document; extract the first set of content elements based on the criteria; access through the network a second electronic document using second specifications, the second specifications varying from the first specifications; extract a second set of content elements based on the criteria; and store the first and second set of content elements in a database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method implemented in a system comprising a micro-processor based device coupled to a network, the method comprising:
-
accessing in the system through the network a first electronic document using first specifications; receiving in the system criteria for extracting a first set of content elements of the first electronic document; extracting in the system the first set of content elements based on the criteria; accessing in the system through the network a second electronic document using second specifications, the second specifications varying from the first specifications; extracting in the system a second set of content elements based on the criteria; and storing the first and second set of content elements in a database of the system. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A method implemented in a system comprising a micro-processor based device coupled to a network, the method comprising:
-
accessing in the system through the network an electronic document, the electronic document comprising a plurality of content elements; extracting in the system a subset of the plurality content elements based on predefined criteria; storing the subset of the plurality content elements in a database of the system. - View Dependent Claims (20, 21)
-
Specification