Multi-stage query processing system and method for use with tokenspace repository
First Claim
1. A method of processing a query in a multi-stage query processing system having one or more processors and memory storing one or more programs for execution by the one or more processors to perform the method comprising:
- performing a first stage processing of a query, including;
retrieving a first set of document identifiers from an index in response to one or more query terms;
generating a first set of relevancy scores for a first set of compressed documents corresponding to at least a subset of the first set of document identifiers based on one or more of;
presence of query terms, term frequency, and document popularity; and
storing the first set of relevancy scores in the memory;
performing a second stage processing of the query, including;
generating a second set of relevancy scores for the documents in the first set of compressed documents based on one or more of;
a list of token positions for one or more query terms in the query, distances between query terms in the documents, attributes of tokens in the documents, and text that appears around a query term used in a document of the first set of documents; and
storing the second set of relevancy scores in the memory;
reading the first and second set of relevancy scores from the memory, and generating an ordered list of documents for further processing based on the first and second set of relevancy scores;
automatically generating additional query terms from the documents in the ordered list of documents;
formulating a new query using the additional query terms;
processing the new query to retrieve a second set of document identifiers from the index and to generate a third set of relevancy scores based at least in part on the additional query terms; and
using the third set of relevancy scores to select a set of top documents for presentation to the user.
1 Assignment
0 Petitions
Accused Products
Abstract
A multi-stage query processing system and method enables multi-stage query scoring, including “snippet” generation, through incremental document reconstruction facilitated by a multi-tiered mapping scheme. At one or more stages of a multi-stage query processing system a set of relevancy scores are used to select a subset of documents for presentation as an ordered list to a user. The set of relevancy scores can be derived in part from one or more sets of relevancy scores determined in prior stages of the multi-stage query processing system. In some embodiments, the multi-stage query processing system is capable of executing one or more passes on a user query, and using information from each pass to expand the user query for use in a subsequent pass to improve the relevancy of documents in the ordered list.
33 Citations
21 Claims
-
1. A method of processing a query in a multi-stage query processing system having one or more processors and memory storing one or more programs for execution by the one or more processors to perform the method comprising:
-
performing a first stage processing of a query, including; retrieving a first set of document identifiers from an index in response to one or more query terms; generating a first set of relevancy scores for a first set of compressed documents corresponding to at least a subset of the first set of document identifiers based on one or more of;
presence of query terms, term frequency, and document popularity; andstoring the first set of relevancy scores in the memory; performing a second stage processing of the query, including; generating a second set of relevancy scores for the documents in the first set of compressed documents based on one or more of;
a list of token positions for one or more query terms in the query, distances between query terms in the documents, attributes of tokens in the documents, and text that appears around a query term used in a document of the first set of documents; andstoring the second set of relevancy scores in the memory; reading the first and second set of relevancy scores from the memory, and generating an ordered list of documents for further processing based on the first and second set of relevancy scores; automatically generating additional query terms from the documents in the ordered list of documents; formulating a new query using the additional query terms; processing the new query to retrieve a second set of document identifiers from the index and to generate a third set of relevancy scores based at least in part on the additional query terms; and using the third set of relevancy scores to select a set of top documents for presentation to the user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A multi-stage query processing system, comprising:
-
one or more processors; memory; and one or more programs stored in the memory, the on or more programs comprising instructions for; performing a first stage processing of a query, including; retrieving a first set of document identifiers from an index in response to one or more query terms; generating a first set of relevancy scores for a first set of compressed documents corresponding to at least a subset of the first set of document identifiers based on one or more of;
presence of query terms, term frequency, and document popularity; andstoring the first set of relevancy scores in the memory; performing a second stage processing of the query, including; generating a second set of relevancy scores for the documents in the first set of compressed documents based on one or more of;
a list of token positions for one or more query terms in the query, distances between query terms in the documents, attributes of tokens in the documents, and text that appears around a query term used in a document of the first set of documents; andstoring the second set of relevancy scores in the memory; reading the first and second set of relevancy scores from the memory, and generating an ordered list of documents for further processing based on the first and second set of relevancy scores; automatically generating additional query terms from the documents in the ordered list of documents; formulating a new query using the additional query terms; processing the new query to retrieve a second set of document identifiers from the index and to generate a third set of relevancy scores based at least in part on the additional query terms; and using the third set of relevancy scores to select a set of top documents for presentation to the user. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A non-transitory computer-readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for:
-
performing a first stage processing of a query, including; retrieving a first set of document identifiers from an index in response to one or more query terms; generating a first set of relevancy scores for a first set of compressed documents corresponding to at least a subset of the first set of document identifiers based on one or more of;
presence of query terms, term frequency, and document popularity; andstoring the first set of relevancy scores in the memory; performing a second stage processing of the query, including; generating a second set of relevancy scores for the documents in the first set of compressed documents based on one or more of;
a list of token positions for one or more query terms in the query, distances between query terms in the documents, attributes of tokens in the documents, and text that appears around a query term used in a document of the first set of documents; andstoring the second set of relevancy scores in the memory; reading the first and second set of relevancy scores from the memory, and generating an ordered list of documents for further processing based on the first and second set of relevancy scores; automatically generating additional query terms from the documents in the ordered list of documents; formulating a new query using the additional query terms; processing the new query to retrieve a second set of document identifiers from the index and to generate a third set of relevancy scores based at least in part on the additional query terms; and using the third set of relevancy scores to select a set of top documents for presentation to the user. - View Dependent Claims (17, 18, 19, 20, 21)
-
Specification