Query routing based on feature learning of data sources
First Claim
1. A method of optimizing the selection of databases to be interrogated during query searching comprising the steps of:
- having a plurality of training sets of documents each characterizing a data domain to be searched;
interrogating the various databases with a plurality of keyword sets generated from each of the training sets of documents;
analyzing documents obtained from the databases by the interrogations to obtain ranking information from those database documents using multiple base learners and a meta learner to rate the applicability of each of the databases to each data domain using the results to weight the databases relative to one another; and
enabling the limitation of the interrogations to the most highly rated sources in a given one of the data domains when the search terms fall within that domain.
1 Assignment
0 Petitions
Accused Products
Abstract
Query routing is based on identifying the preeminent search systems and data sources for each of a number of information domains. This involves assigning a weight to each search system or data source for each of the information domains. The greater the weight, the more preeminent a search system or data source is in a particular information domain. These weights Wi{1=0, 1,2, . . . N] are computed through a recursive learning process employing meta processing. The meta learning process involves simultaneous interrogation of multiple search systems to take advantage of the cross correlation between the search systems and data sources. In this way, assigning a weight to a search system takes into consideration results obtained about other search systems so that the assigned weights reflect the relative strengths of each of the systems or sources in a particular information domain. In the present process, a domain dataset used as an input to query generator. The query generator extracts keywords randomly from the domain dataset. Sets of the extracted keywords constitute a domain specific search query. The query is submitted to the multiple search systems or sources to be evaluated. Initially, a random average weight is assigned to each search system or source. Then, the meta learning process recursively evaluates the search results and feeds back a weight correction dWi to be applied to each system or source server by using weight difference calculator. After a certain number of iterations, the weights Wi reach stable values. These stable values are the values assigned to the search system under evaluation. When searches are performed, the weights are used to determine search systems or sources that are interrogated.
-
Citations
14 Claims
-
1. A method of optimizing the selection of databases to be interrogated during query searching comprising the steps of:
-
having a plurality of training sets of documents each characterizing a data domain to be searched;
interrogating the various databases with a plurality of keyword sets generated from each of the training sets of documents;
analyzing documents obtained from the databases by the interrogations to obtain ranking information from those database documents using multiple base learners and a meta learner to rate the applicability of each of the databases to each data domain using the results to weight the databases relative to one another; and
enabling the limitation of the interrogations to the most highly rated sources in a given one of the data domains when the search terms fall within that domain. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer program product on a computer usable medium for optimizing the selection of databases to be interrogated during query searching comprising:
-
software for using a plurality of training sets of documents each characterizing a data domain to be searched;
software for the various databases with a plurality of keyword sets generated from each of the training sets of documents;
software for analyzing documents obtained from the databases by the interrogations to obtain ranking information from those databases documents using multiple base learners and a meta learner to rate the applicability of each of the databases to each data domain using the results to weight the database relative to one another; and
software for identifying a data domain based on the search terms of a query and limiting interrogated data sources to the highly rated sources of that domain the most highly rated sources when the search terms fall within the domain. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
Specification