Facilitating world wide web searches utilizing a multiple search engine query clustering fusion strategy
First Claim
1. A method implemented on a computer for facilitating World Wide Web Searches, or similar searches, by combining search result documents, as provided by separate search engines in response to a query, into one single integrated list so as to produce a ranked list of pages, said method comprising the steps of:
- (a) training said computer for each search engine by clustering training queries and building cluster centroids;
(b) Assign weights to each cluster reflecting the number of relevant pages expected to be obtained by this search engine for queries similar to those in that cluster(c) processing an incoming query by selecting, for each search engine, that cluster centroid that is most similar to said incoming query and returning the weight associated with the selected cluster as the weight of the current search engine; and
(d) apportioning the N slots in the retrieved set according to the weights returned by each search engine.
2 Assignments
0 Petitions
Accused Products
Abstract
A method implemented on a computer for facilitating World Wide Web Searches and like database searches by combining search result documents, as provided by separate search engines in response to a query, into one single integrated list so as to produce a single document with a ranked list of pages, includes the steps of: (a) training the computer for each search engine by clustering training queries and building cluster centroids; (b) Assign weights to each cluster reflecting the number of relevant pages expected to be obtained by this search engine for queries similar to those in that cluster; (c) processing an incoming query by selecting, for each search engine, that cluster centroid that is most similar to the incoming query and returning the weight associated with the selected cluster as the weight of the current search engine; and (d) apportioning the N slots in the retrieved set according to the weights returned by each search engine.
-
Citations
15 Claims
-
1. A method implemented on a computer for facilitating World Wide Web Searches, or similar searches, by combining search result documents, as provided by separate search engines in response to a query, into one single integrated list so as to produce a ranked list of pages, said method comprising the steps of:
-
(a) training said computer for each search engine by clustering training queries and building cluster centroids; (b) Assign weights to each cluster reflecting the number of relevant pages expected to be obtained by this search engine for queries similar to those in that cluster (c) processing an incoming query by selecting, for each search engine, that cluster centroid that is most similar to said incoming query and returning the weight associated with the selected cluster as the weight of the current search engine; and (d) apportioning the N slots in the retrieved set according to the weights returned by each search engine. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 15)
-
-
9. A method implemented on a computer for facilitating World Wide Web Searches or similar searches by combining search result documents, as provided by separate search engines in response to a query, into one single integrated list so as to produce a ranked list of pages, said method comprising the steps of:
-
(a) training for each search engine in accordance with the following steps; (1) deriving a plurality of outputs from respective search engines; (2) deriving a similarity measure from a number of documents retrieved in common between two queries; (3) creating a query vector for a current query; (4) determining the centroid of a query cluster by averaging vectors of queries contained within said cluster; and (5) assigning to a cluster a weight that reflects how effective queries in the cluster are for the corresponding search engine, whereby the larger the weight, the more effective the queries are expected to be; and (b) following said training by the following steps; (6) selecting that cluster whose centroid vector is most similar to said query vector for the query; (7) returning the weight associated with the selected cluster as the weight of the current search engine; and (8) apportioning the N slots in the retrieved set according to the weights returned by each search engine.
-
-
14. A method implemented on a computer for facilitating World Wide Web Searches or similar searches by combining search result documents, as provided by separate search engines in response to a query, into one single integrated list so as to produce a ranked list of pages, said method comprising the steps of:
-
(a) training said computer for each search engine by clustering training queries and building cluster centroids by the steps of; applying a clustering algorithm, using the number of pages retrieved in common at a rank less than or equal to a parameter L as the similarity between two queries; forming clusters from hierarchy by considering all queries that cluster above a certain threshold as belonging to the same or a common cluster; and forming a centroid for a particular cluster by creating a mean vector over all query vectors in said cluster; (b) Assign weights to each cluster reflecting the umber of relevant pages expected to be obtained by this search engine for queries similar to those in that cluster, by the steps of; computing a cluster'"'"'s weight as the mean number of relevant pages retrieved at a rank less than or equal to a parameter L over all the queries in said cluster; (c) processing an incoming query by selecting, for each search engine, that cluster centroid that is most similar to said incoming query and returning the weight associated with the selected cluster as the weight of the current search engine by the steps of; creating a query vector for a current query in the vector space of the training queries; and computing a vector similarity measure, between the current query vector and each of said centroids; selecting that centroid that has the greatest similarity; and (d) apportioning the N slots in the retrieved set according to the weights returned by each search engine by the steps of; summing weights returned by search engines; selecting the top weight-of-this-engine/sum (rounded down) pages from the set retrieved by each engine; when fewer then N pages are retrieved due to rounding, selecting 1 more page from the most highly weighted engines until N pages are retrieved, any ties being broken arbitrarily; ranking pages in a set that has been retrieved probabilistically using a biased c-faced die; for selecting that document which is to be in the next rank r of the final ranking, rolling a c-faced die that is biased by the number of pages remaining to be placed in the final ranking from each of the engines; and selecting an engine whose number corresponds to that die roll resulting from the rolling of a c-faced die in the preceding step;
placing the next page from that engine'"'"'s ranking into a final ranking; andrepeating until all N pages have been placed in said final ranking.
-
Specification