Multi-stage query processing system and method for use with tokenspace repository

US 20060036593A1
Filed: 08/13/2004
Published: 02/16/2006
Est. Priority Date: 08/13/2004
Status: Active Grant

First Claim

Patent Images

1. A method of processing a query in a multi-stage query processing system, comprising:

retrieving a first set of document identifiers from an index in response to one or more query terms;

generating a first set of relevancy scores for a set of compressed documents corresponding to at least a subset of the first set of document identifiers;

decompressing at least a portion of the set of compressed documents to recover a first set of tokens, wherein the first set of recovered tokens are associated with positions in the set of compressed documents corresponding to the first set of document identifiers; and

generating additional query terms from the first set of recovered set of tokens;

formulating a new query using the additional query terms; and

processing the new query to retrieve a second set of document identifiers from the index and to generate a second set of relevancy scores based at least in part on the additional query terms.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A multi-stage query processing system and method enables multi-stage query scoring, including “snippet” generation, through incremental document reconstruction facilitated by a multi-tiered mapping scheme. At one or more stages of a multi-stage query processing system a set of relevancy scores are used to select a subset of documents for presentation as an ordered list to a user. The set of relevancy scores can be derived in part from one or more sets of relevancy scores determined in prior stages of the multi-stage query processing system. In some embodiments, the multi-stage query processing system is capable of executing one or more passes on a user query, and using information from each pass to expand the user query for use in a subsequent pass to improve the relevancy of documents in the ordered list.

Citations

9 Claims

1. A method of processing a query in a multi-stage query processing system, comprising:
- retrieving a first set of document identifiers from an index in response to one or more query terms;
  
  generating a first set of relevancy scores for a set of compressed documents corresponding to at least a subset of the first set of document identifiers;
  
  decompressing at least a portion of the set of compressed documents to recover a first set of tokens, wherein the first set of recovered tokens are associated with positions in the set of compressed documents corresponding to the first set of document identifiers; and
  
  generating additional query terms from the first set of recovered set of tokens;
  
  formulating a new query using the additional query terms; and
  
  processing the new query to retrieve a second set of document identifiers from the index and to generate a second set of relevancy scores based at least in part on the additional query terms.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, further comprising:
    - decompressing at least a portion of the set of compressed documents to recover a second set of tokens, wherein the second set of recovered tokens are associated with positions in the set of compressed documents corresponding to the second set of document identifiers; and
      
      reconstructing one or more portions of the set of compressed documents using the second set of recovered tokens.
  - 3. The method of claim 1, further comprising:
    - presenting the reconstructed portions to a user with an ordered list of documents selected from the set of compressed documents based at least in part on the second set of relevancy scores.
  - 4. The method of claim 1, wherein the second set of relevancy scores are based on one or more positions of the query terms in the set of compressed documents corresponding to the second set of document identifiers.
  - 5. The method of claim 1, wherein the second set of relevancy scores are based on distances between query terms in the set of compressed documents corresponding to the second set of document identifiers.
  - 6. The method of claim 3, wherein the second set of relevancy scores are based on a context in which a query term is used in the set of compressed documents corresponding to the second set of document identifiers.

7. A method of processing a query in a multi-stage query processing system, comprising:
- retrieving a first set of information in response to one or more query terms;
  
  generating at least one additional query term based on the first set of information;
  
  formulating a new query using the at least one additional query term, the new query having a plurality of query terms; and
  
  processing the new query to retrieve a set of document identifiers from an index;
  
  generating a set of relevancy scores for a set of compressed documents corresponding to at least a subset of the set of document identifiers;
  
  decompressing at least a portion of the set of compressed documents to recover a set of tokens, wherein the set of recovered tokens are associated with positions of one or more query terms of the plurality of query terms in the set of compressed documents corresponding to the set of document identifiers; and
  
  generating a list of documents based on at least a portion of the set of document identifiers, the list including information corresponding to at least a portion of the set of recovered tokens.

8. A computer-readable medium having stored thereon instructions which, when executed by a processor in a multi-stage query processing system, causes the processor to perform the operations of:
- retrieving a first set of document identifiers from an index in response to one or more query terms;
  
  generating a first set of relevancy scores for a set of compressed documents corresponding to at least a subset of the first set of document identifiers;
  
  decompressing at least a portion of the set of compressed documents to recover a first set of tokens, wherein the first set of recovered tokens are associated with positions in the set of compressed documents corresponding to the first set of document identifiers; and
  
  generating additional query terms from the first set of recovered set of tokens;
  
  formulating a new query using the additional query terms; and
  
  processing the new query to retrieve a second set of document identifiers from the index and to generate a second set of relevancy scores based at least in part on the additional query terms.

9. A multi-stage query processing system, comprising:
- means for retrieving a first set of document identifiers from an index in response to one or more query terms;
  
  means for generating a first set of relevancy scores for a set of compressed documents corresponding to at least a subset of the first set of document identifiers;
  
  means for decompressing at least a portion of the set of compressed documents to recover a first set of tokens, wherein the first set of recovered tokens are associated with positions in the set of compressed documents corresponding to the first set of document identifiers; and
  
  means for generating additional query terms from the first set of recovered set of tokens;
  
  means for formulating a new query using the additional query terms; and
  
  means for processing the new query to retrieve a second set of document identifiers from the index and to generate a second set of relevancy scores based at least in part on the additional query terms.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Sercinoglu, Olcan, Haahr, Paul G., Singhal, Amitabh K., Dean, Jeffrey Adgate

Granted Patent

US 8,407,239 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/24578   using ranking

G06F 16/30   of unstructured textual dat...

G06F 16/951   Indexing; Web crawling tech...

Multi-stage query processing system and method for use with tokenspace repository

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Multi-stage query processing system and method for use with tokenspace repository

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links