System and methodology providing improved information retrieval
First Claim
Patent Images
1. An improved method which operates in a computer system comprising a central server and one or more remote servers and which performs distributed query processing, the method comprising:
- receiving at the central server a search request specifying one or more terms for retrieving documents;
at each remote server where documents have been indexed, generating local Bayes calculation data for each term on that server;
at the central server, generating global Bayes calculation data by combining the local Bayes calculation data from each remote server, and allowing the remote servers to accurately score and rank documents as if they were all located on one server on a per query basis;
sending the global Bayes calculation data to each remote server as part of the distributed query processing;
at each remote server, producing a list of matching documents sorted by relevance ranking by performing a local Bayesian query calculation based at least in part on said global Bayes calculation data; and
at the central server, generating a final list of most relevant documents by merging the lists of matching relevant documents from each remote server, each document having an accurate score calculated from the local calculation data and global Bayesian statistics on a per query basis.
1 Assignment
0 Petitions
Accused Products
Abstract
System and methodology for performing Bayesian-based distributed query processing is provided that solves the problem of how to get each server participating in a Bayesian distributed search system to return the same accurate relevance score for different documents. By performing calculations in a two-step process, accurate Bayesian calculation results are obtained whilst distributing the document indexing and query processing.
-
Citations
30 Claims
-
1. An improved method which operates in a computer system comprising a central server and one or more remote servers and which performs distributed query processing, the method comprising:
-
receiving at the central server a search request specifying one or more terms for retrieving documents; at each remote server where documents have been indexed, generating local Bayes calculation data for each term on that server; at the central server, generating global Bayes calculation data by combining the local Bayes calculation data from each remote server, and allowing the remote servers to accurately score and rank documents as if they were all located on one server on a per query basis; sending the global Bayes calculation data to each remote server as part of the distributed query processing; at each remote server, producing a list of matching documents sorted by relevance ranking by performing a local Bayesian query calculation based at least in part on said global Bayes calculation data; and at the central server, generating a final list of most relevant documents by merging the lists of matching relevant documents from each remote server, each document having an accurate score calculated from the local calculation data and global Bayesian statistics on a per query basis. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. An information retrieval system comprising:
-
a central hub and one or more remote document stores; a module, operating at the central hub, for receiving a search request specifying one or more terms for retrieving documents; a module, operating at remote document stores, for generating local Bayes calculation data for each term for documents indexed at that document store; a module, operating at the central hub, for generating global Bayes calculation data by combining the local Bayes calculation data from each remote document store, and sending the global Bayes calculation data back to each remote document store, and allowing the remote document stores to accurately score and rank documents as if they were all located on one document store on a per query basis; a module, operating at remote document stores, for producing a list of matching documents sorted by relevance ranking by performing a local Bayesian query calculation based at least in part on said global Bayes calculation data; and a module, operating at the central hub, for generating a final list of most relevant documents by merging the lists of matching relevant documents from each remote document store, each document having an accurate score calculated from the local calculation data and global Bayesian statistics on a per query basis. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. An improved method for performing distributed Bayesian query processing, the method comprising:
-
receiving at a first computer a search request for retrieving documents pursuant to one or more search terms; based on said search request, sending a lightweight request to other computers where documents have been indexed, for determining local Bayes calculation data for each term on each of the other computers; generating at the first computer global Bayes calculation data by combining the local Bayes calculation data from each of the other computers; sending the global Bayes calculation data to each of the other computers as part of the distributed Bayesian query processing; at each of the other computers, producing a list of matching documents sorted by relevance ranking, by performing a local Bayesian query calculation that takes into account said global Bayes calculation data, and allowing the other computers to accurately score and rank documents as if they were all located on one server computer on a per query basis; and generating at the first computer a final list of most relevant documents by merging the lists of matching relevant documents from each of the other computers, each document having an accurate score calculated from the local calculation data and global Bayesian statistics on a per query basis. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30)
-
Specification