Metasearch technique that ranks documents obtained from multiple collections
First Claim
1. A method for identifying and ranking documents contained in a plurality of document collections that form a metacollection, comprising the steps of:
- receiving a query string at a metasearch engine, and transmitting terms in said query to search engines associated with said document collections;
at each search engine, dynamically computing local statistics related to said terms for the documents in a collection with which said search engine is associated, including a score normalization factor that comprises a mean document length for the documents in the collection, in response to receipt of said query, and providing said local statistics to the metasearch engine;
computing at least one global statistic related to the documents in the metacollection, including a score normalization factor that comprises a mean document length for the documents in the metacollection, in response to receipt of said local statistics at the metasearch engine, and transmitting said global statistic to said search engines;
determining relevancy scores for said documents at said search engines in accordance with said global statistic;
normalizing said scores in accordance with said normalization factor for the metacollection; and
providing references to documents in said metacollection in accordance said relevancy scores.
6 Assignments
0 Petitions
Accused Products
Abstract
In a metasearch conducted across multiple document collections, a multi-phase approach is employed in which local and global statistics are dynamically exchanged between the search engines and the metasearch engine in response to a user'"'"'s query. In the first phase, the query is transmitted to the search engines from the metasearch engine, and each search engine computes or retrieves previously-computed local statistics for those terms in its associated document collection. In the second phase, each search engine returns its local statistics. A third phase consists of computing metacollection level statistics at the metasearch engine, based upon the information contained in the local statistics. The metacollection level statistics are disseminated to the search engines. In the final phase, the search engines rank the documents in their respective collections pursuant to the metacollection level statistics, and transmit the rankings to the metasearch engine. The metasearch engine merges the results from the individual search engines, to produce a single ranked results list for the user.
166 Citations
125 Claims
-
1. A method for identifying and ranking documents contained in a plurality of document collections that form a metacollection, comprising the steps of:
-
receiving a query string at a metasearch engine, and transmitting terms in said query to search engines associated with said document collections;
at each search engine, dynamically computing local statistics related to said terms for the documents in a collection with which said search engine is associated, including a score normalization factor that comprises a mean document length for the documents in the collection, in response to receipt of said query, and providing said local statistics to the metasearch engine;
computing at least one global statistic related to the documents in the metacollection, including a score normalization factor that comprises a mean document length for the documents in the metacollection, in response to receipt of said local statistics at the metasearch engine, and transmitting said global statistic to said search engines;
determining relevancy scores for said documents at said search engines in accordance with said global statistic;
normalizing said scores in accordance with said normalization factor for the metacollection; and
providing references to documents in said metacollection in accordance said relevancy scores. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
a) transmitting a request from said metasearch engine to said search engines for at least one document reference; and
b) returning from each search engine a reference to at least the highest-ranking document reference in the sorted order within said search engine;
and wherein the step of providing references to documents across said metacollection comprises the following steps;
c) sorting the document references returned from said search engines into a metacollection-level order determined by said criteria; and
d) providing to a requestor a reference to at least the highest-ranking document in the metacollection-level order.
-
-
18. The method of claim 17, further including the steps of:
-
e) identifying the search engine that provided said highest-ranking document reference in step (d);
f) sending a request from said metasearch engine to the search engine identified in step (e) for at least one additional document reference;
g) merging the document returned in response to step (f) into said metacollection-level sorted order; and
h) providing to a requestor at least the next-highest-ranking document in the metacollection-level order after the document reference provided in step (d).
-
-
19. The method of claim 17 wherein step (b) comprises returning from said search engine to said metasearch engine a reference to each of the at least two highest-ranking documents in the sorted order within said search engine and storing said references in memory at the metasearch engine;
- and further including the step of providing to a requestor the next-highest-ranking document in the metacollection-level order after the document reference provided in step (d) from said memory at the metasearch engine.
-
20. The method of claim 17, wherein the document references returned at step (b) are stored in memory at the metasearch engine;
- and further including the following steps, for each said search engine;
e) determining whether said memory contains at least the next-highest-ranking document in said search engine'"'"'s sorted order;
f) if said determination is negative, sending a request from said metasearch engine to said search engine for at least the next-highest-ranking document in the search engine'"'"'s sorted order; and
g) providing to a requestor at least the next-highest-ranking document in the metacollection-level order after the document reference provided in step (d).
- and further including the following steps, for each said search engine;
-
21. The method of claim 17, further including the steps of:
-
e) sending a request from said metasearch engine to at least one of said search engines for at least one additional document reference;
f) repeating at said search engine said step of determining relevancy scores for at least those documents not yet returned by the search engine to the metasearch engine; and
g) sending at least one additional document reference from said search engine to said metasearch engine in response to said request.
-
-
22. The method of claim 17, wherein said step of determining relevancy scores includes the step of storing in memory at each said search engine at least one of said relevancy scores, and further including the steps of:
-
e) sending a request from said metasearch engine to at least one of said search engines for at least one additional document reference;
f) at said search engine, determining whether said memory contains sufficient information to respond to said request;
g) repeating said step of determining relevancy scores at said search engine; and
h) sending at least one additional document reference from said search engine to said metasearch engine in response to said request.
-
-
23. The method of claim 17 wherein each document reference comprises a copy of the associated document.
-
24. The method of claim 17 wherein each document reference comprises a link to the location of the associated document.
-
25. The method of claim 24 wherein said document reference further includes a description of the document.
-
26. The method of claim 17 wherein each document reference comprises an identification of the associated document.
-
27. The method of claim 17 wherein said criteria include said relevancy scores.
-
28. The method of claim 17 wherein said criteria include an alphabetical ordering.
-
29. The method of claim 17 wherein said criteria include a date associated with each document.
-
30. The method of claim 1 including the further step of sorting references to the documents in a collection into an order determined by at least one sort criterion, and providing said scores to said metasearch engine according to the following steps:
-
receiving a request at said metasearch engine for M document references;
sending a request from said metasearch engine to each of said search engines for K document references;
returning from each said search engine to said metasearch engine a reference to each of the K highest ranking document references in the sorted order within said search engine;
sorting the document references returned from said search engines into a metacollection-level order determined by said criteria and storing said references in memory at the metasearch engine;
providing to a requestor a reference to at least the highest-ranking document in the sorted order within said metasearch engine;
receiving at the metasearch engine a subsequent request for N document references;
determining, for each search engine, whether references to the documents at said search engine with ranks M+1 through M+N within the metacollection-level order are present in said memory;
sending a request for K additional document references from said metasearch engine to each of said search engines for which said determining step is negative;
returning from each said search engine a reference to each of the K next-highest-ranking document references in the sorted order within said search engine;
storing said references in memory at the metasearch engine; and
providing to a requestor a reference to at least the next-highest-ranking document in the metacollection-level order within said metasearch engine that was not provided previously.
-
-
31. The method of claim 1 wherein said step of transmitting terms comprises parsing the query string at the metasearch engine and transmitting each resulting term to said search engines.
-
32. The method of claim 1 wherein said step of transmitting terms comprises transmitting the entire query string to said search engines, and said step of computing local statistics includes parsing the query string at each search engine.
-
33. The method of claim 32 wherein said step of providing said local statistics to the metasearch engine includes providing a character offset and length for each term within the query string, and said step of computing global statistics includes computing a match between differently parsed terms that are co-located within the query string and subsequently combining global statistics in accordance with said match.
-
34. The method of claim 33 wherein said step of computing a match comprises identifying terms that completely enclose one another within said query string.
-
35. A method for identifying and ranking documents contained in a plurality of document collections that form a metacollection, comprising the steps of:
-
receiving a query string at a metasearch engine, and transmitting terms in said query to search engines associated with said document collections;
at each search engine, dynamically computing local statistics related to said terms for the documents in a collection with which said search engine is associated, including a score normalization factor for the collection, in response to receipt of said query, and providing said local statistics to the metasearch engine, wherein said normalization factor for a collection comprises a local mean value for the number of times the most frequent term in a document appears in that document for each of the documents in the collection;
computing at least one global statistic related to the documents in the metacollection, including a score normalization factor for the metacollection, in response to receipt of said local statistics at the metasearch engine, and transmitting said global statistic to said search engines, wherein the normalization factor for the metacollection comprises a mean value for said local mean values across all of the collections in the metacollection;
determining relevancy scores for said documents at said search engines in accordance with said global statistic;
normalizing said scores in accordance with said normalization factor for the metacollection; and
providing references to documents in said metacollection in accordance said relevancy scores.
-
-
36. A system for identifying and ranking documents contained in a plurality of document collections that form a metacollection, comprising:
-
a plurality of local search engines, each being associated with at least one of said document collections, for receiving query terms and computing local statistics related to said terms for the documents in a collection with which said search engine is associated;
a plurality of intermediate level search engines, each of which receives said local statistics from a subset of said local search engines, and computes intermediate-level statistics that are based upon the collections of documents associated with the corresponding subset of local search engines; and
a metasearch engine that receives the intermediate-level statistics from said plurality of intermediate level search engines and computes global statistics related to the documents in the metacollection, in response to receipt of said intermediate-level statistics at the metasearch engine, and transmits said global statistics to said local search engines;
wherein said local search engines determine relevancy scores for the documents in an associated collection in accordance with said global statistics, and provide said scores to the corresponding intermediate level engine, the intermediate level engines combine the scores from the corresponding subset of local search engines and provide the combined scores to the metasearch engine, and the metasearch engine ranks the documents across said metacollection in accordance with relevancy scores received from said intermediate level engines. - View Dependent Claims (37, 38, 39, 40, 41, 42, 43, 44)
-
-
45. A method for identifying and ranking documents contained in a plurality of document collections that form a metacollection, comprising the steps of:
-
receiving query terms at a plurality of local search engines, each being associated with at least one of said document collections, and computing local statistics related to said terms for the documents in a collection with which said search engine is associated;
transmitting local statistics from subsets of said local search engines to respective ones of a plurality of intermediate level search engines, and computing intermediate-level statistics at each of said intermediate level search engines that are based upon the collections of documents associated with a corresponding subset of local search engines;
transmitting the intermediate-level statistics from said plurality of intermediate level search engines to a metasearch engine, computing global statistics related to the documents in the metacollection, in response to receipt of said intermediate-level statistics at the metasearch engine, and transmits said global statistics to said local search engines;
determining relevancy scores for the documents in an associated collection in accordance with said global statistics at each of said local search engines, and providing said scores to the corresponding intermediate level engine;
combining the scores from the corresponding subset of local search engines at the intermediate level engines and providing the combined scores to the metasearch engine; and
ranking the documents across said metacollection at the metasearch engine in accordance with relevancy scores received from said intermediate level engines.
-
-
46. A system for identifying and ranking documents contained in a plurality of document collections that form a metacollection, comprising:
-
a plurality of local search engines, each being associated with at least one of said document collections, for receiving query terms and dynamically computing local statistics related to said terms for the documents in a collection with which said search engine is associated in response to receipt of a query, said local statistics including a score normalization factor that comprises a mean document length for the documents in the associated collection; and
a metasearch engine which receives said local statistics from said local search engines, and computes global statistics related to the documents in the metacollection, in response to receipt of said local statistics at the metasearch engine, and transmits said global statistics to said local search engines, said global statistics including a score normalization factor that comprises a mean document length for the documents in the metacollection;
wherein said local search engines determine relevancy scores for the documents in an associated collection in accordance with said global statistics, normalize said scores in accordance with said normalization factor for the metacollection, and provide said normalized scores to the metasearch engine, and the metasearch engine ranks the documents across said metacollection in accordance with relevancy scores received from said local search engines. - View Dependent Claims (47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78)
receiving a request from said metasearch engine for K document references; and
returning to said metasearch engine a reference to each of the K highest ranking document references in the sorted order within a local search engine;
and wherein said metasearch engine responds to the returned references as follows;
sorting the document references returned from said local search engines into a metacollection-level order determined by said criteria and storing said references in memory;
providing to a requestor a reference to at least the highest-ranking document in the sorted order within said metasearch engine in response to a request for M document references;
in response to a subsequent request for N document references, determining, for each local search engine, whether references to the documents at said local search engine with ranks M+1 through M+N within the metacollection-level order are present in said memory;
sending a request for K additional document references to each of said local search engines for which said determining step is negative;
receiving from each said local search engine a reference to each of the K next-highest-ranking document references in the sorted order within said local search engine;
storing said references in memory; and
providing to a requestor a reference to at least the next-highest-ranking document in the metacollection-level order within said metasearch engine that was not provided previously.
-
-
48. The system of claim 46 wherein said metasearch engine parses the query string and transmits each resulting term to said local search engines.
-
49. The system of claim 46 wherein the entire query string is transmitted to said local search engines, and said local search engines parse the query string.
-
50. The system of claim 49 wherein said local search engines provide a character offset and length for each term within the query string to the metasearch engine, and said metasearch engine computes a match between differently parsed terms that are co-located within the query string and subsequently combines global statistics in accordance with said match.
-
51. The system of claim 50 wherein said metasearch engine identifies terms that completely enclose one another within said query string.
-
52. The system of claim 46 wherein said metasearch engine computes said global statistics dynamically in response to receipt of said query.
-
53. The system of claim 46 wherein the relevancy scores for documents are based solely upon the terms in said query which appear in the respective documents.
-
54. The system of claim 46 wherein all global statistics required to determine the relevancy scores are computed at the metasearch engine.
-
55. The system of claim 46 wherein a further global statistic is computed at the search engines in response to receipt of said global statistics from said metasearch engine.
-
56. The system of claim 46 wherein said local search engines filter the documents in an associated collection in accordance with a Boolean condition, and wherein the local statistics are computed on the documents which result from said filtering.
-
57. The system of claim 56 wherein the documents in a collection are evaluated in accordance with said Boolean condition in response to receipt of a query.
-
58. The system of claim 57 wherein the Boolean condition is specified by a user in conjunction with a query.
-
59. The system of claim 57 wherein the query is generated by a user who has access to only a portion of the documents in a collection, and wherein each document in the collection is evaluated in accordance with said Boolean condition regardless of whether the user has access to the document.
-
60. The system of claim 46 wherein said local statistics include the size of a collection and a measure of the frequency of a given term within the collection, and wherein said global statistics include a measure of the size of the metacollection and a measure of the frequency of a given term within the metacollection.
-
61. The system of claim 60 wherein said measure of the size of the collection comprises the number of documents in the collection, said measure of the frequency of a given term within the collection comprises the number of documents in the collection that contain said given term, said measure of the size of the metacollection comprises the number of documents in the metacollection, and said measure of the frequency of a given term within the metacollection comprises the number of documents in the metacollection that contain said given term.
-
62. The system of claim 60 wherein said measure of the size of the collection comprises the number of terms in the collection, said measure of the frequency of a given term within the collection comprises the number of occurrences in the collection of said given term, said measure of the size of the metacollection comprises the number of terms in the metacollection, and said measure of the frequency of a given term within the metacollection comprises the number of occurrences in the metacollection of said given term.
-
63. The system of claim 60 wherein another global statistic comprises an inverse frequency factor that is computed from the size of the metacollection and the frequency of a given term within the metacollection.
-
64. The system of claim 63 wherein said inverse frequency factor is computed at the metasearch engine.
-
65. The system of claim 63 wherein said inverse frequency factor is computed at each of the local search engines.
-
66. The system of claim 46 wherein each local search engine sorts references to the documents in a collection into an order determined by one or more criteria, and provides said scores to said metasearch engine by:
-
a) receiving a request from said metasearch engine for at least one document reference; and
b) returning a reference to at least the highest-ranking document reference in the sorted order within said local search engine;
and wherein said metasearch engine provides references to documents across said metacollection by;
c) sorting the document references returned from said local search engines into a metacollection-level order determined by said criteria; and
d) providing to a requestor a reference to at least the highest-ranking document in the metacollection-level order.
-
-
67. The system of claim 66, wherein said metasearch engine further operates to:
-
identify the local search engine that provided said highest-ranking document reference;
send a request from said metasearch engine to identified local search engine for at least one additional document reference;
merge the document returned in response to said request into said metacollection-level sorted order; and
provide to a requestor at least the next-highest-ranking document in the metacollection-level order after the previously provided document reference.
-
-
68. The method of claim 66 wherein each local search engine returns to said metasearch engine a reference to each of the at least two highest-ranking documents in the sorted order within said search engine;
- and wherein the metasearch engine stores said references in memory and provides to a requestor the next-highest-ranking document in the metacollection-level order from said memory.
-
69. The system of claim 66, wherein the document references returned from the local search engines are stored in memory at the metasearch engine;
- and wherein said metasearch engine performs the following steps, for each local search engine;
determining whether said memory contains at least the next-highest-ranking document in said local search engine'"'"'s sorted order;
if said determination is negative, sending a request to said local search engine for at least the next-highest-ranking document in the local search engine'"'"'s sorted order; and
providing to a requestor at least the next-highest-ranking document in the metacollection-level order after the previously provided document reference.
- and wherein said metasearch engine performs the following steps, for each local search engine;
-
70. The system of claim 66, wherein said metasearch engine sends a request to at least one of said local search engines for at least one additional document reference;
- and in response thereto a local search engine determines relevancy scores for at least those documents not yet returned by the search engine to the metasearch engine, and sends at least one additional document reference to said metasearch engine in response to said request.
-
71. The method of claim 66, wherein each local search engine stores in memory at least one of said relevancy scores, and wherein said metasearch engine sends a request to at least one of said local search engines for at least one additional document reference;
- said local search engine determines whether said memory contains sufficient information to respond to said request, and if not, determines relevancy scores and sends at least one additional document reference to said metasearch engine in response to said request.
-
72. The system of claim 66 wherein each document reference comprises a copy of the associated document.
-
73. The system of claim 66 wherein each document reference comprises a link to the location of the associated document.
-
74. The system of claim 73 wherein said document reference further includes a description of the document.
-
75. The system of claim 66 wherein each document reference comprises an identification of the associated document.
-
76. The system of claim 66 wherein said criteria include said relevancy scores.
-
77. The system of claim 66 wherein said criteria include an alphabetical ordering.
-
78. The system of claim 66 wherein said criteria include a date associated with each document.
-
79. A system for identifying and ranking documents contained in a plurality of document collections that form a metacollection, comprising:
-
associated in response to receipt of a query, said local statistics including a score normalization factor for a collection that comprises a local mean value for the number of times the most frequent term in a document appears in that document for each of the documents in the collection; a metasearch engine which receives said local statistics from said local search engines, and computes global statistics related to the documents in the metacollection, in response to receipt of said local statistics at the metasearch engine, and transmits said global statistics to said local search engines, wherein the global statistics include a score normalization factor for the metacollection that comprises a mean value for said local mean values across all of the collections in the metacollection;
wherein said local search engines determine relevancy scores for the documents in an associated collection in accordance with said global statistics, normalize said scores in accordance with said normalization factor for the metacollection and provide said normalized scores to the metasearch engine, and the metasearch engine ranks the documents across said metacollection in accordance with relevancy scores received from said local search engines.
-
-
80. A method for identifying and ranking documents contained in a plurality of document collections that form a metacollection, comprising the steps of:
-
receiving a query string at a metasearch engine, and transmitting terms in said query to search engines associated with said document collections;
at each search engine, computing local statistics related to said terms for the documents in a collection with which said search engine is associated, including a score normalization factor for a collection that comprises a mean document length for the documents in the collection, and providing said local statistics to the metasearch engine;
computing at least one global statistic related to the documents in the metacollection in response to receipt of said local statistics at the metasearch engine, including a score normalization factor for the metacollection that comprises a mean document length for the documents in the metacollection, and transmitting said global statistic to said search engines;
normalizing said scores in accordance with said normalization factor for the metacollection;
determining relevancy scores for said documents at said search engines in accordance with said global statistic; and
providing references to documents in said metacollection in accordance said relevancy scores.
-
-
81. A method for identifying and ranking documents contained in a plurality of document collections that form a metacollection, comprising the steps of:
-
receiving a query string at a metasearch engine, and transmitting terms in said query to search engines associated with said document collections;
at each search engine, computing local statistics related to said terms for the documents in a collection with which said search engine is associated, including a score normalization factor for a collection that comprises a local mean value for the number of times the most frequent term in a document appears in that document for each of the documents in the collection, and providing said local statistics to the metasearch engine;
computing at least one global statistic related to the documents in the metacollection in response to receipt of said local statistics at the metasearch engine, including a score normalization factor for the metacollection that comprises a mean value for said local mean values across all of the collections in the metacollection, and transmitting said global statistic to said search engines;
normalizing said scores in accordance with said normalization factor for the metacollection;
determining relevancy scores for said documents at said search engines in accordance with said global statistic; and
providing references to documents in said metacollection in accordance said relevancy scores.
-
-
82. A method for identifying and ranking documents contained in a plurality of document collections that form a metacollection, comprising the steps of:
-
receiving a query string at a metasearch engine, and transmitting terms in said query to search engines associated with said document collections;
at each search engine, computing local statistics related to said terms for the documents in a collection with which said search engine is associated, including a score normalization factor that comprises a mean document length for the documents in the collection, and providing said local statistics to the metasearch engine;
computing at least one global statistic related to the documents in the metacollection, including a score normalization factor that comprises a mean document length for the documents in the metacollection, in response to receipt of said local statistics at the metasearch engine, and transmitting said global statistic to said search engines;
determining relevancy scores for said documents at said search engines in accordance with said global statistic and sorting references to the documents in a collection into an order determined by at least one sort criterion;
normalizing said scores in accordance with said normalization factor for the metacollection;
receiving a request at said metasearch engine for M document references;
sending a request from said metasearch engine to each of said search engines for K document references;
returning from each said search engine to said metasearch engine a reference to each of the K highest ranking document references in the sorted order within said search engine;
sorting the document references returned from said search engines into a metacollection-level order determined by said criteria and storing said references in memory at the metasearch engine;
providing to a requestor a reference to at least the highest-ranking document in the sorted order within said metasearch engine;
receiving at the metasearch engine a subsequent request for N document references;
determining, for each search engine, whether references to the documents at said search engine with ranks M+1 through M+N within the metacollection-level order are present in said memory;
sending a request for K additional document references from said metasearch engine to each of said search engines for which said determining step is negative;
returning from each said search engine a reference to each of the K next-highest-ranking document references in the sorted order within said search engine;
storing said references in memory at the metasearch engine; and
providing to a requestor a reference to at least the next-highest-ranking document in the metacollection-level order within said metasearch engine that was not provided previously. - View Dependent Claims (83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115)
a) transmitting a request from said metasearch engine to said search engines for at least one document reference; and
b) returning from each search engine a reference to at least the highest-ranking document reference in the sorted order within said search engine;
and wherein the step of providing references to documents across said metacollection comprises the following steps;
c) sorting the document references returned from said search engines into a metacollection-level order determined by said criteria; and
d) providing to a requestor a reference to at least the highest-ranking document in the metacollection-level order.
-
-
100. The method of claim 99, further including the steps of:
-
e) identifying the search engine that provided said highest-ranking document reference in step (d);
f) sending a request from said metasearch engine to the search engine identified in step (e) for at least one additional document reference;
g) merging the document returned in response to step (f) into said metacollection-level sorted order; and
h) providing to a requestor at least the next-highest-ranking document in the metacollection-level order after the document reference provided in step (d).
-
-
101. The method of claim 99 wherein step (b) comprises returning from said search engine to said metasearch engine a reference to each of the at least two highest-ranking documents in the sorted order within said search engine and storing said references in memory at the metasearch engine;
- and further including the step of providing to a requestor the next-highest-ranking document in the metacollection-level order after the document reference provided in step (d) from said memory at the metasearch engine.
-
102. The method of claim 99, wherein the document references returned at step (b) are stored in memory at the metasearch engine;
- and further including the following steps, for each said search engine;
e) determining whether said memory contains at least the next-highest-ranking document in said search engine'"'"'s sorted order;
f) if said determination is negative, sending a request from said metasearch engine to said search engine for at least the next-highest-ranking document in the search engine'"'"'s sorted order; and
g) providing to a requestor at least the next-highest-ranking document in the metacollection-level order after the document reference provided in step (d).
- and further including the following steps, for each said search engine;
-
103. The method of claim 99, further including the steps of:
-
e) sending a request from said metasearch engine to at least one of said search engines for at least one additional document reference;
f) repeating at said search engine said step of determining relevancy scores for at least those documents not yet returned by the search engine to the metasearch engine; and
g) sending at least one additional document reference from said search engine to said metasearch engine in response to said request.
-
-
104. The method of claim 99, wherein said step of determining relevancy scores includes the step of storing in memory at each said search engine at least one of said relevancy scores, and further including the steps of:
-
e) sending a request from said metasearch engine to at least one of said search engines for at least one additional document reference;
f) at said search engine, determining whether said memory contains sufficient information to respond to said request;
g) repeating said step of determining relevancy scores at said search engine; and
h) sending at least one additional document reference from said search engine to said metasearch engine in response to said request.
-
-
105. The method of claim 99 wherein each document reference comprises a copy of the associated document.
-
106. The method of claim 99 wherein each document reference comprises a link to the location of the associated document.
-
107. The method of claim 106 wherein said document reference further includes a description of the document.
-
108. The method of claim 99 wherein each document reference comprises an identification of the associated document.
-
109. The method of claim 99 wherein said criteria include said relevancy scores.
-
110. The method of claim 99 wherein said criteria include an alphabetical ordering.
-
111. The method of claim 99 wherein said criteria include a date associated with each document.
-
112. The method of claim 82 wherein said step of transmitting terms comprises parsing the query string at the metasearch engine and transmitting each resulting term to said search engines.
-
113. The method of claim 82 wherein said step of transmitting terms comprises transmitting the entire query string to said search engines, and said step of computing local statistics includes parsing the query string at each search engine.
-
114. The method of claim 113 wherein said step of providing said local statistics to the metasearch engine includes providing a character offset and length for each term within the query string, and said step of computing global statistics includes computing a match between differently parsed terms that are co-located within the query string and subsequently combining global statistics in accordance with said match.
-
115. The method of claim 114 wherein said step of computing a match comprises identifying terms that completely enclose one another within said query string.
-
116. A method for identifying and ranking documents contained in a plurality of document collections that form a metacollection, comprising the steps of:
-
receiving a query string at a metasearch engine, and transmitting terms in said query to search engines associated with said document collections;
at each search engine, computing local statistics related to said terms for the documents in a collection with which said search engine is associated, and providing said local statistics to the metasearch engine, wherein said local statistics include a score normalization factor for a collection that comprises a local mean value for the number of times the most frequent term in a document appears in that document for each of the documents in the collection;
computing at least one global statistic related to the documents in the metacollection, in response to receipt of said local statistics at the metasearch engine, and transmitting said global statistic to said search engines, wherein the global statistic includes a score normalization factor for the metacollection that comprises a mean value for said local mean values across all of the collections in the metacollection;
determining relevancy scores for said documents at said search engines in accordance with said global statistic and sorting references to the documents in a collection into an order determined by at least one sort criterion;
normalizing said scores in accordance with said normalization factor for the metacollection;
receiving a request at said metasearch engine for M document references;
sending a request from said metasearch engine to each of said search engines for K document references;
returning from each said search engine to said metasearch engine a reference to each of the K highest ranking document references in the sorted order within said search engine;
sorting the document references returned from said search engines into a metacollection-level order determined by said criteria and storing said references in memory at the metasearch engine;
providing to a requestor a reference to at least the highest-ranking document in the sorted order within said metasearch engine;
receiving at the metasearch engine a subsequent request for N document references;
determining, for each search engine, whether references to the documents at said search engine with ranks M+1 through M+N within the metacollection-level order are present in said memory;
sending a request for K additional document references from said metasearch engine to each of said search engines for which said determining step is negative;
returning from each said search engine a reference to each of the K next-highest-ranking document references in the sorted order within said search engine;
storing said references in memory at the metasearch engine; and
providing to a requestor a reference to at least the next-highest-ranking document in the metacollection-level order within said metasearch engine that was not provided previously.
-
-
117. A system for identifying and ranking documents contained in a plurality of document collections that form a metacollection, comprising:
-
a plurality of local search engines, each being associated with at least one of said document collections, for receiving query terms and computing local statistics related to said terms for the documents in a collection with which said search engine is associated;
at least one additional search engine which receives said local statistics from a subset of said local search engines, and computes comprehensive statistics that are based upon the collections of documents associated with the corresponding subset of local search engines; and
a further search engine that receives the comprehensive statistics from said additional search engine and statistics from at least one other search engine, and computes global statistics for the documents to which the received statistics pertain, and transmits said global statistics to said local search engines;
wherein said local search engines determine relevancy scores for the documents in an associated collection in accordance with said global statistics, and provide said scores to the additional search engine, the additional search engine combines the scores from the corresponding subset of local search engines and provides the combined scores to the further search engine, and the further search engine ranks the documents across said metacollection in accordance with received relevancy scores. - View Dependent Claims (118, 119, 120, 121, 122, 123, 124, 125)
-
Specification