System and method for finding information in a distributed information system using query learning and meta search
First Claim
1. A method of adding new documents to a resource list of existing documents, executable in a computer system, comprising the steps of:
- learning a rule for which the documents on the resource list are positive examples of a class of selection information which selects the documents on the resource list;
making a persistent association between the selection information and the resource list;
using the selection information independent for a meta search engine to identify data on a plurality of items characterized as positive and/or negative examples of the class of information to select a set of documents which the information specifies; and
adding new documents to the resource list, the new documents being added belonging to a subset of the selected set of documents which contains documents which are not already on the resource list.
1 Assignment
0 Petitions
Accused Products
Abstract
An information retrieval system finds information in a Distributed Information System (DIS), e.g. the Internet using query learning and meta search for adding documents to resource directories contained in the DIS. A selection means generates training data characterized as positive and negative examples of a particular class of data residing in the DIS. A learning means generates from the training data at least one query that can be submitted to any one of a plurality of search engines for searching the DIS to find “new” items of the particular class. An evaluation means determines and verifies that the new item(s) is a new subset of the particular class and adds or updates the particular class in the resource directory.
-
Citations
33 Claims
-
1. A method of adding new documents to a resource list of existing documents, executable in a computer system, comprising the steps of:
-
learning a rule for which the documents on the resource list are positive examples of a class of selection information which selects the documents on the resource list;
making a persistent association between the selection information and the resource list;
using the selection information independent for a meta search engine to identify data on a plurality of items characterized as positive and/or negative examples of the class of information to select a set of documents which the information specifies; and
adding new documents to the resource list, the new documents being added belonging to a subset of the selected set of documents which contains documents which are not already on the resource list. - View Dependent Claims (2, 3, 4, 5, 6)
interactively determining whether a document in the subset should be added to the resource list; and
adding the document only if it has been determined that the document should be added.
-
-
3. The method set forth in claim 2 further comprising the steps of:
-
using a document for which it has been determined that the document should not be added together with documents on the resource list to learn new selection information; and
associating the new selection information with the resource list.
-
-
4. The method set forth in claim 1 wherein the step of learning the selection information comprises the steps of:
-
learning a rule for which the documents on the resource list are positive examples;
translating the rule into a query; and
in the step of using the selection information, using the query to select the set of documents.
-
-
5. The method set forth in any of claims 1 through 4 wherein:
-
the system in which the method is practiced has access to a plurality of searching means;
the step of learning the selection information learns a plurality of queries as required by the plurality of searching means; and
the step of using the selection information to select a set of documents uses the plurality of queries in the plurality of searching means.
-
-
6. The method set forth in claim 5 wherein:
- the system in which the method is practiced has access to the world wide web; and
the searching means are searching means in the world wide web.
- the system in which the method is practiced has access to the world wide web; and
-
7. An improved web page of a type which contains a list of documents, the improvement comprising:
-
a machine learning system for learning a rule for which the documents on the web page are positive examples of a class of selection information which selects the documents on the web page;
a meta search engine using selection information to identify data on a plurality of items characterized as positive and/or negative examples of a class of information associated with the web page which selects documents having content which is similar to the documents on the list, whereby the list of documents on the web page is updated using the selection information.
-
-
8. In a computer system, apparatus, for making a resource list of documents which have contents belonging to the same class, the apparatus comprising:
-
a first list of documents, all of which have contents as positive examples belonging to the class;
a second list of documents, none of which have contents as negative examples belonging to the class;
learning means responsive to the first list of documents and the second list of documents for learning a rule for which the documents on the resource list arc positive examples of the class of selection information which specifies documents whose contents belong to the class;
meta search means responsive to the selection information for finding the documents whose contents belong to the class, using the documents to make the resource list, and making a persistent association between the selection information and the resource list. - View Dependent Claims (9, 10, 11, 12)
first interactive means for indicating whether a given document is to be added to the first list or the second list.
-
-
10. The apparatus set forth in claim 9 further comprising:
second interactive means for activating the learning means.
-
11. The apparatus set forth in claim 10 further comprising:
third interactive means for activating the means for finding the documents.
-
12. The apparatus set forth in any of claims 9 through 11 wherein:
-
the apparatus is used in a system which includes a document browser; and
the interactive means of the claim are implemented in the document browser.
-
-
13. In an information system which stores related data and information as items for a plurality of interconnected computers accessible by a plurality of users, a method for finding items of a particular class residing in the information system comprising the steps of:
-
a) identifying as training data a plurality of items characterized as positive and/or negative examples of the class;
b) using a learning technique to generate from the training data at least one query that can be submitted to any of a plurality of methods for searching the information system;
c) submitting said query to meta search means and collecting any new item(s) as a response to the query;
d) evaluating the new item(s) by a learned model with the aim of verifying that the new item(s) is indeed a new subset of the particular class; and
e) presenting the new subset of the new item(s) to a user of the system. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. An information system which stores related data and information as items for a plurality of interconnected computers accessible by a plurality of users for finding items of a particular class residing in the information system using query learning and meta search, comprising:
-
a) means for identifying as training data in the system a plurality of items characterized as positive and/or negative examples of the class;
b) means for using a learning technique to generate from the training data at least one query that can be submitted to any of a plurality of search engines for searching the information system;
c) means for submitting said query to a meta search engine and collecting any new item(s) as a response to the query;
d) means for evaluating the new item(s) by the at least one search engine with the aim of verifying that the new item(s) is indeed a new subset of the particular class; and
e) means for presenting the new subset of the new item(s) to a user of the system. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32)
-
-
33. An article of manufacture comprising:
-
a computer useable medium having computer readable program code means embodied therein for finding items of a particular class residing an information system which stored related data and information as items for a plurality of interconnected computers accessible by a plurality of users, the computer readable program code means in said article of manufacture comprising;
a) program code means for identifying as training data a plurality of items characterized as positive and/or negative examples of the class;
b) program code means for using a learning technique to generate from the training data at least one query that can be submitted to any of a plurality of methods for searching the information system;
c) program code means for submitting said query to meta search means and collecting any new item(s) as a response to the query;
d) program code means for evaluating the new item(s) by meta search means with the aim of verifying that the new item(s) is indeed a new subset of the particular class; and
e) program code means for presenting the new subset of new item(s) to a user of the system.
-
Specification