Query routing based on feature learning of data sources

US 20040024745A1
Filed: 07/31/2002
Published: 02/05/2004
Est. Priority Date: 07/31/2002
Status: Active Grant

First Claim

Patent Images

1. A method of optimizing the selection of databases to be interrogated during query searching comprising the steps of:

having a plurality of training sets of documents each characterizing a data domain to be searched;

interrogating the various databases with a plurality of keyword sets generated from each of the training sets of documents;

analyzing documents obtained from the databases by the interrogations to obtain ranking information from those database documents using multiple base learners and a meta learner to rate the applicability of each of the databases to each data domain using the results to weight the databases relative to one another; and

enabling the limitation of the interrogations to the most highly rated sources in a given one of the data domains when the search terms fall within that domain.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Query routing is based on identifying the preeminent search systems and data sources for each of a number of information domains. This involves assigning a weight to each search system or data source for each of the information domains. The greater the weight, the more preeminent a search system or data source is in a particular information domain. These weights Wi{1=0, 1,2, . . . N] are computed through a recursive learning process employing meta processing. The meta learning process involves simultaneous interrogation of multiple search systems to take advantage of the cross correlation between the search systems and data sources. In this way, assigning a weight to a search system takes into consideration results obtained about other search systems so that the assigned weights reflect the relative strengths of each of the systems or sources in a particular information domain. In the present process, a domain dataset used as an input to query generator. The query generator extracts keywords randomly from the domain dataset. Sets of the extracted keywords constitute a domain specific search query. The query is submitted to the multiple search systems or sources to be evaluated. Initially, a random average weight is assigned to each search system or source. Then, the meta learning process recursively evaluates the search results and feeds back a weight correction dWi to be applied to each system or source server by using weight difference calculator. After a certain number of iterations, the weights Wi reach stable values. These stable values are the values assigned to the search system under evaluation. When searches are performed, the weights are used to determine search systems or sources that are interrogated.

Citations

14 Claims

1. A method of optimizing the selection of databases to be interrogated during query searching comprising the steps of:
- having a plurality of training sets of documents each characterizing a data domain to be searched;
  
  interrogating the various databases with a plurality of keyword sets generated from each of the training sets of documents;
  
  analyzing documents obtained from the databases by the interrogations to obtain ranking information from those database documents using multiple base learners and a meta learner to rate the applicability of each of the databases to each data domain using the results to weight the databases relative to one another; and
  
  enabling the limitation of the interrogations to the most highly rated sources in a given one of the data domains when the search terms fall within that domain.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 providing a result evaluator to analyze results from the documents to generate the ranking information.
  - 3. The method of claim 2 including feeding the ranking information to each of the base learners and having the base learners feed results to the meta learner.
  - 4. The method of claim 1 in which the meta learner provides results from each keyword set to a weight difference calculator to weigh the results of interrogated sources.
  - 5. The method of claim 4 including having the weight difference calculator develop an output for each of the interrogated sources indicating the ranking position of the source relative to the other sources.
  - 6. The method of claim 5 including a reranking algorithm for adjusting the ranking by obtaining the present ranking of a source from the meta search engine and the ranking information provided by the weight difference calculator.
  - 7. The method of claim 6 including having the weight differential information provided to the query generator and having the query generator adjust the queries provided to the search engine based on changes reflected by the weight calculator.

8. A computer program product on a computer usable medium for optimizing the selection of databases to be interrogated during query searching comprising:
- software for using a plurality of training sets of documents each characterizing a data domain to be searched;
  
  software for the various databases with a plurality of keyword sets generated from each of the training sets of documents;
  
  software for analyzing documents obtained from the databases by the interrogations to obtain ranking information from those databases documents using multiple base learners and a meta learner to rate the applicability of each of the databases to each data domain using the results to weight the database relative to one another; and
  
  software for identifying a data domain based on the search terms of a query and limiting interrogated data sources to the highly rated sources of that domain the most highly rated sources when the search terms fall within the domain.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The computer program product of claim 8 providing a result evaluator to analyze results from the documents to generate the ranking information.
  - 10. The computer program product of claim 8 including software for feeding the ranking information to each of the base learners and having the base learners feed results to the meta learner.
  - 11. The computer program product of claim 8 including software for the meta learner provides results from each query to a weight difference calculator.
  - 12. The computer program product of claim 8 including software for the weight difference calculator that develops an output for each of the interrogated sources indicating the ranking position of the source relative to the other sources.
  - 13. The computer program product of claim 8 including software for a reranking algorithm for adjusting the ranking for data sources by obtaining the present ranking of a source from the meta search engine and the ranking information provided by the weight difference calculator.
  - 14. The computer program product of claim 8 including software for having the weight differential information provided to a query generator and having the query generator adjust the key datasets provided to the search engine based on changes determined by the weight calculator.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Kozakov, Lev, Kim, Moon Ju, Drissi, Youssef, Jeng, Jun-Jang, Leon-Rodriquez, Juan

Granted Patent

US 6,886,009 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/2
CPC Class Codes

G06F 16/951   Indexing; Web crawling tech...

G06F 16/9538   Presentation of query results

Y10S 707/99933   Query processing, i.e. sear...

Query routing based on feature learning of data sources

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Query routing based on feature learning of data sources

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links