Facilitating world wide web searches utilizing a multiple search engine query clustering fusion strategy

US 5,864,845 A
Filed: 06/28/1996
Issued: 01/26/1999
Est. Priority Date: 06/28/1996
Status: Expired due to Term

First Claim

Patent Images

1. A method implemented on a computer for facilitating World Wide Web Searches, or similar searches, by combining search result documents, as provided by separate search engines in response to a query, into one single integrated list so as to produce a ranked list of pages, said method comprising the steps of:

(a) training said computer for each search engine by clustering training queries and building cluster centroids;

(b) Assign weights to each cluster reflecting the number of relevant pages expected to be obtained by this search engine for queries similar to those in that cluster(c) processing an incoming query by selecting, for each search engine, that cluster centroid that is most similar to said incoming query and returning the weight associated with the selected cluster as the weight of the current search engine; and

(d) apportioning the N slots in the retrieved set according to the weights returned by each search engine.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method implemented on a computer for facilitating World Wide Web Searches and like database searches by combining search result documents, as provided by separate search engines in response to a query, into one single integrated list so as to produce a single document with a ranked list of pages, includes the steps of: (a) training the computer for each search engine by clustering training queries and building cluster centroids; (b) Assign weights to each cluster reflecting the number of relevant pages expected to be obtained by this search engine for queries similar to those in that cluster; (c) processing an incoming query by selecting, for each search engine, that cluster centroid that is most similar to the incoming query and returning the weight associated with the selected cluster as the weight of the current search engine; and (d) apportioning the N slots in the retrieved set according to the weights returned by each search engine.

Citations

15 Claims

1. A method implemented on a computer for facilitating World Wide Web Searches, or similar searches, by combining search result documents, as provided by separate search engines in response to a query, into one single integrated list so as to produce a ranked list of pages, said method comprising the steps of:
- (a) training said computer for each search engine by clustering training queries and building cluster centroids;
  
  (b) Assign weights to each cluster reflecting the number of relevant pages expected to be obtained by this search engine for queries similar to those in that cluster(c) processing an incoming query by selecting, for each search engine, that cluster centroid that is most similar to said incoming query and returning the weight associated with the selected cluster as the weight of the current search engine; and
  
  (d) apportioning the N slots in the retrieved set according to the weights returned by each search engine.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 15)
- - 2. A method implemented on a computer in accordance with claim 1, wherein step (d) comprises the steps of:
    - summing weights returned by search engines;
      
      selecting the top weight-of-this-engine/sum (rounded down) pages from the set retrieved by each engine;
      
      when fewer then N pages are retrieved due to rounding, selecting 1 more page from the most highly weighted engines until N pages are retrieved, any ties being broken arbitrarily; and
      
      ranking pages in a set that has been retrieved probabilistically using a biased c-faced die.
  - 3. A method implemented on a computer in accordance with claim 2, wherein step (a) comprises the steps of:
    - applying Ward'"'"'s clustering algorithm, using the number of pages retrieved in common at a rank less than or equal to a parameter L as the similarity between two queries;
      
      forming clusters from hierarchy by considering all queries that cluster above a certain threshold as belonging to the same or a common cluster; and
      
      forming a centroid for a particular cluster by creating a mean vector over all query vectors in said cluster.
  - 4. A method implemented on a computer accordance with claim 1, wherein step (b) comprises a step of:
    - computing a cluster'"'"'s weight as the mean number of relevant pages retrieved at a rank less than or equal to a parameter L over all the queries in said cluster.
  - 5. A method implemented on a computer in accordance with claim 4, wherein step (c) comprises a steps of:
    - creating a query vector for a current query in the vector space of the training queries;
      
      computing a vector similarity measure, between the current query vector and each of said centroids; and
      
      selecting that centroid that has the greatest similarity.
  - 6. A method implemented on a computer in accordance with claim 5, wherein said vector similarity measure is the cosine.
  - 7. A method implemented on a computer in accordance with claim 5, wherein step (d) comprises the steps of:
    - for selecting that document which is to be in the next rank r of the final ranking, rolling a c-faced die that is biased by the number of pages remaining to be placed in the final ranking from each of the engines;
      
      selecting an engine whose number corresponds to that die roll resulting from the rolling of a c-faced die in the preceding step;
      
      placing the next page from that engine'"'"'s ranking into a final ranking; and
      
      repeating until all N pages have been placed in said final ranking.
  - 8. A method implemented on a computer in accordance with claim 3, wherein step (a) comprises the steps of:
    - creating query vectors from query text using standard vector processing techniques; and
      
      weighting terms using a function that is proportional to the number of times the term occurs in the query, where the weight of a term in the centroid vector is the sum of its weights in the vectors of the queries in the cluster divided by the number of queries in the cluster.
  - 10. A method implemented on a computer in accordance with claim 1, wherein step (a) comprises the steps of:
    - applying a clustering algorithm, using the number of pages retrieved in common at a rank less than or equal to a parameter L as the similarity between two queries;
      
      forming clusters from hierarchy by considering all queries that cluster above a certain threshold as belonging to the same or a common cluster; and
      
      forming a centroid for a particular cluster by creating a mean vector over all query vectors in said cluster.
  - 11. A method implemented on a computer in accordance with claim 1, wherein said clustering algorithm is Ward'"'"'s clustering algorithm.
  - 12. A method implemented on a computer in accordance with claim 10, wherein step (6) comprises the steps of:
    - creating a query vector for a current query in the vector space of the training queries; and
      
      computing a vector similarity measure, between the current query vector and each of said centroids;
      
      selecting that centroid that has the greatest similarity.
  - 13. A method implemented on a computer in accordance with claim 5, wherein step (8) comprises the steps of:
    - for selecting that document which is to be in the next rank r of the final ranking, rolling a c-faced die that is biased by the number of pages remaining to be placed in the final ranking from each of the engines;
      
      selecting an engine whose number corresponds to that die roll resulting from the rolling of a c-faced die in the preceding step;
      
      placing the next page from that engine'"'"'s ranking into a final ranking; and
      
      repeating until all N pages have been placed in said final ranking.
  - 15. A method implemented on a computer in accordance with claim 3, wherein step (a) comprises the steps of:
    - creating query vectors from query text using standard vector processing techniques; and
      
      weighting terms using a function that is proportional to the number of times the term occurs in the query, where the weight of a term in the centroid vector is the sum of its weights in the vectors of the queries in the cluster divided by the number of queries in the cluster.

9. A method implemented on a computer for facilitating World Wide Web Searches or similar searches by combining search result documents, as provided by separate search engines in response to a query, into one single integrated list so as to produce a ranked list of pages, said method comprising the steps of:
- (a) training for each search engine in accordance with the following steps;
  
  (1) deriving a plurality of outputs from respective search engines;
  
  (2) deriving a similarity measure from a number of documents retrieved in common between two queries;
  
  (3) creating a query vector for a current query;
  
  (4) determining the centroid of a query cluster by averaging vectors of queries contained within said cluster; and
  
  (5) assigning to a cluster a weight that reflects how effective queries in the cluster are for the corresponding search engine, whereby the larger the weight, the more effective the queries are expected to be; and
  
  (b) following said training by the following steps;
  
  (6) selecting that cluster whose centroid vector is most similar to said query vector for the query;
  
  (7) returning the weight associated with the selected cluster as the weight of the current search engine; and
  
  (8) apportioning the N slots in the retrieved set according to the weights returned by each search engine.

14. A method implemented on a computer for facilitating World Wide Web Searches or similar searches by combining search result documents, as provided by separate search engines in response to a query, into one single integrated list so as to produce a ranked list of pages, said method comprising the steps of:
- (a) training said computer for each search engine by clustering training queries and building cluster centroids by the steps of;
  
  applying a clustering algorithm, using the number of pages retrieved in common at a rank less than or equal to a parameter L as the similarity between two queries;
  
  forming clusters from hierarchy by considering all queries that cluster above a certain threshold as belonging to the same or a common cluster; and
  
  forming a centroid for a particular cluster by creating a mean vector over all query vectors in said cluster;
  
  (b) Assign weights to each cluster reflecting the umber of relevant pages expected to be obtained by this search engine for queries similar to those in that cluster, by the steps of;
  
  computing a cluster'"'"'s weight as the mean number of relevant pages retrieved at a rank less than or equal to a parameter L over all the queries in said cluster;
  
  (c) processing an incoming query by selecting, for each search engine, that cluster centroid that is most similar to said incoming query and returning the weight associated with the selected cluster as the weight of the current search engine by the steps of;
  
  creating a query vector for a current query in the vector space of the training queries; and
  
  computing a vector similarity measure, between the current query vector and each of said centroids;
  
  selecting that centroid that has the greatest similarity; and
  
  (d) apportioning the N slots in the retrieved set according to the weights returned by each search engine by the steps of;
  
  summing weights returned by search engines;
  
  selecting the top weight-of-this-engine/sum (rounded down) pages from the set retrieved by each engine;
  
  when fewer then N pages are retrieved due to rounding, selecting 1 more page from the most highly weighted engines until N pages are retrieved, any ties being broken arbitrarily;
  
  ranking pages in a set that has been retrieved probabilistically using a biased c-faced die;
  
  for selecting that document which is to be in the next rank r of the final ranking, rolling a c-faced die that is biased by the number of pages remaining to be placed in the final ranking from each of the engines; and
  
  selecting an engine whose number corresponds to that die roll resulting from the rolling of a c-faced die in the preceding step;
  
  placing the next page from that engine'"'"'s ranking into a final ranking; and
  
  repeating until all N pages have been placed in said final ranking.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Siemens Corp. (Siemens AG)
Original Assignee
Siemens Corporate Research Incorporated (Siemens AG)
Inventors
Gupta, Narendra K., Voorhees, Ellen M.
Primary Examiner(s)
Lintz, Paul R.

Application Number

US08/674,644
Time in Patent Office

942 Days
Field of Search

707/3, 707/4, 707/2, 707/5, 707/1
US Class Current

1/1
CPC Class Codes

G06F 16/951   Indexing; Web crawling tech...

G06F 16/9532   Query formulation

G06F 16/9538   Presentation of query results

Y10S 707/99931   Database or file accessing

Y10S 707/99932   Access augmentation or opti...

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99934   Query formulation, input pr...

Y10S 707/99935   Query augmenting and refini...

Facilitating world wide web searches utilizing a multiple search engine query clustering fusion strategy

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Facilitating world wide web searches utilizing a multiple search engine query clustering fusion strategy

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links