Data extraction from world wide web pages
First Claim
1. A system for querying heterogeneous data sources distributed over a network, said system comprising:
- a request translator for translating a data request having an associated data context into a query having a second data context associated with at least one of the heterogeneous data sources;
a query converter for converting a portion of the query into at least one command which can be used to interact with a World Wide Web page by accessing a specification file associated with the data source, said specification file providing the commands necessary to access the World Wide Web page containing the requested data;
a command transmitter for issuing the at least one command over the network to a semi-structured data source;
a data retriever for extracting data from at least one of the heterogeneous data sources; and
a data translator which translates retrieved data from the data contexts associated with the data sources into the data context associated with the request.
5 Assignments
0 Petitions
Accused Products
Abstract
A system for querying disparate, heterogeneous data sources over a network, where at least some of the data sources are World Wide Web pages or other semi-structured data sources, includes a query converter, a command transmitter, and a data retriever. The query converter produces, from at least a portion of a query, a set of commands which can be used to interact with a semi-structured data source. The query converter may accept a request in the same form as normally used to access a relational data base, therefore increasing the number of data bases available to a user in a transparent manner. The command transmitter issues the produced commands to the semi-structured data source. The data retriever then retrieves the desired data from the data source. In this manner, structured queries may be used to access both traditional, relational data bases as well as non-traditional, semi-structured data bases such as web sites and flat files. The system may also include a request translator and a data translator for providing data context interchange. The request translator translates a request for data having a first data context into a query having a second data context which the query converter described above. The data translator translates data retrieved from the data context of the data source into the data context associated with the request. A related method for querying disparate data sources over a network is also described.
-
Citations
30 Claims
-
1. A system for querying heterogeneous data sources distributed over a network, said system comprising:
-
a request translator for translating a data request having an associated data context into a query having a second data context associated with at least one of the heterogeneous data sources; a query converter for converting a portion of the query into at least one command which can be used to interact with a World Wide Web page by accessing a specification file associated with the data source, said specification file providing the commands necessary to access the World Wide Web page containing the requested data; a command transmitter for issuing the at least one command over the network to a semi-structured data source; a data retriever for extracting data from at least one of the heterogeneous data sources; and a data translator which translates retrieved data from the data contexts associated with the data sources into the data context associated with the request. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for querying heterogeneous data sources distributed over a network, said method comprising the steps of:
-
(a) translating a data request having an associated data context into a query having a second data context associated with at least one of the heterogeneous data sources to be queried; (b) converting a portion of the query into at least one command which can be used to interact with a semi-structured data source; (c) issuing the at least one command to at least one of the World Wide Web page by accessing a specification file associated with the data source, said specification file providing the commands necessary to access the World Wide Web page containing the requested data; (d) retrieving data from at least one of the heterogeneous data sources; and (e) translating retrieved data from the data contexts associated with the heterogeneous data sources into the data context associated with the request. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A method for querying semi-structured data sources in response to a structured data request, the method comprising the steps of:
-
(a) converting a data request into one or more commands which can be used to interact with a World Wide Web page by accessing a specification file associated with the data source, said specification file providing the commands necessary to access the World Wide Web page containing the requested data; (b) issuing at least one of the one or more commands to said semi-structured data source; and (c) retrieving data from said semi-structured data source. - View Dependent Claims (23, 24, 25, 26, 27)
-
-
28. A system for retrieving data from a semi-structured data source in response to a request, the system comprising:
-
a request converter for converting a request into one or more commands which can be used to interact with a World Wide Web page by accessing a specification file associated with the data source, said specification file providing the commands necessary to access the World Wide Web page containing the requested data; a command transmitter for issuing at least one of the one or more commands to said semi-structured data source; and a data retriever for extracting data from said semi-structured data source. - View Dependent Claims (29, 30)
-
Specification