Data access system
First Claim
1. An apparatus for populating a destination database, said apparatus comprising:
- a destination database;
means for connecting the apparatus to a distributed source database;
a memory area for containing a set of groups of keywords, each group of keywords being related to a predetermined subject category;
means for controlling at least one search engine associated with said distributed source database, on the basis of said groups of keywords contained in said memory area, to provide search results including information relating to the location of documents containing said keywords, said documents being stored in said distributed source database;
means for scoring each of the documents identified in search results provided by said at least one search engine on the basis of the respective contents thereof in accordance with predetermined criteria;
means for selecting at least some of the documents scored by said scoring means, on the basis of their respective scores; and
means for storing, for each document selected by said selecting means, in said destination database information relating to the location of the document in said distributed source database and its predetermined subject category.
5 Assignments
0 Petitions
Accused Products
Abstract
A method of automatically creating a database on the basis of a set of category headings uses a set of keywords provided for each category heading. The keywords are used by a processing platform to define searches to be carried out on a plurality of search engines connected to the processing platform via the Internet. The search results are processed by the processing platform to identify the URLs embedded in the search results. The URLs are then used to retrieve the pages to which they refer from remote data sources in the Internet. The processing platform then filters and scores the pages to determine which pages are the most relevant to the original categories. Internet location information for the most relevant pages is stored in the database.
-
Citations
6 Claims
-
1. An apparatus for populating a destination database, said apparatus comprising:
-
a destination database;
means for connecting the apparatus to a distributed source database;
a memory area for containing a set of groups of keywords, each group of keywords being related to a predetermined subject category;
means for controlling at least one search engine associated with said distributed source database, on the basis of said groups of keywords contained in said memory area, to provide search results including information relating to the location of documents containing said keywords, said documents being stored in said distributed source database;
means for scoring each of the documents identified in search results provided by said at least one search engine on the basis of the respective contents thereof in accordance with predetermined criteria;
means for selecting at least some of the documents scored by said scoring means, on the basis of their respective scores; and
means for storing, for each document selected by said selecting means, in said destination database information relating to the location of the document in said distributed source database and its predetermined subject category.- View Dependent Claims (2, 3)
means for identifying locating information contained in search results provided by said at least one search engine and storing said location information in a second memory area.
-
-
3. An apparatus as in claim 1, further comprising:
means for retrieving documents identified in search results provided by said at least one search engine and storing said retrieved document in a document store.
-
4. An apparatus for populating a destination database, said apparatus comprising:
-
a destination database;
a memory area for containing a set of groups of keywords, each group of keywords being related to a predetermined subject category; and
a processing platform which can access said destination database and said memory area, which is connectable to a distributed source database, and which is arranged to;
control at least one search engine associated with said distributed database, on the basis of said groups of keywords contained in said memory area, to provide search results including information relating to the location of documents containing said keywords, said document being stored in said distributed source database;
score each of the documents identified in said search results on the basis of the respective contents thereof in accordance with predetermined criteria;
select at least some of the documents on the basis of their respective scores; and
store, for each selected document, in said destination database information relating to the location of the document in said distributed source database and its predetermined subject category.
-
-
5. An apparatus for populating a destination database, said apparatus comprising:
-
a) means for connecting the apparatus to a distributed source database;
b) a first memory area for storing a set of groups of keywords each group of keywords being associated with a pre-determined subject category;
c) means for reading a keyword from the first memory area and transmitting said keyword to search means, said search means having access to the distributed source database;
d) means for receiving search results from said search means and storing said results in a second memory area, said results including information relating to the location of documents stored in said source database containing said keyword;
e) means for identifying and storing said location information in a third memory area;
f) means for reading location information from the third memory area and transmitting a request to the source database to return a copy of a document associated with selected location information to the apparatus;
g) means for receiving and storing said copy of said document in a fourth memory area;
h) means for accessing and scoring each of the documents stored in the fourth memory area on the basis of the respective contents thereof in accordance with pre-determined criteria; and
i) means for selecting at least some of said documents on the basis of the respective scores and, for each selected document, storing in a database information relating to the location of the document in the source database and its predetermined subject category.
-
-
6. A method of populating a destination database, said method comprising:
-
controlling at least one search engine associated with a distributed source database, on the basis of a set of groups of keywords, each group of keywords being related to a predetermined subject category, to provide search results including information relating to the location of documents containing said keywords, said documents being stored in said distributed source database;
scoring each of the documents identified in said search results on the basis of the respective contents thereof in accordance with predetermined criteria;
selecting at least some of the documents on the basis of their respective scores; and
storing, for each selected document, in said destination database information relating to the location of the document in said distributed source database and its predetermined subject category.
-
Specification