INDEX OPTIMIZATION FOR RANKING USING A LINEAR MODEL

US 20100121838A1
Filed: 01/19/2010
Published: 05/13/2010
Est. Priority Date: 06/27/2008
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for reducing an amount of ranking data analyzed at query time, the method comprising computer-implemented operations for:

at index time, selecting a term from a master index, the term corresponding to a number of documents greater than a threshold;

selecting a set of documents that includes the term based on the master index;

determining a rank for each document in the set of documents that contains the term;

assigning each document in the set of documents that contains the term to a top document list or a bottom document list based on the rank; and

storing predefined values of at least part of the rank in the top document list for documents in the top document list and not storing the predefined values of at least part of the rank in the bottom document list for documents in the bottom document list.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Technologies are described herein for providing a more efficient approach to ranking search results. An illustrative technology reduces an amount of ranking data analyzed at query time. In the technology, a term is selected, at index time, from a master index. The term corresponds to a number of documents greater than a threshold. A set of documents that includes the term is selected based on the master index. A rank is determined for each document in the set of documents that contains the term. Each document in the set of documents that contains the term is assigned to a top document list or a bottom document list based on the rank. Predefined values of at least part of the rank are stored in the top document list for documents in the top document list and are not stored in the bottom document list for documents in the bottom document list.

Citations

20 Claims

1. A computer-implemented method for reducing an amount of ranking data analyzed at query time, the method comprising computer-implemented operations for:
- at index time, selecting a term from a master index, the term corresponding to a number of documents greater than a threshold;
  
  selecting a set of documents that includes the term based on the master index;
  
  determining a rank for each document in the set of documents that contains the term;
  
  assigning each document in the set of documents that contains the term to a top document list or a bottom document list based on the rank; and
  
  storing predefined values of at least part of the rank in the top document list for documents in the top document list and not storing the predefined values of at least part of the rank in the bottom document list for documents in the bottom document list.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The computer-implemented method of claim 1, wherein the rank comprises a linear rank.
  - 3. The computer-implemented method of claim 2, wherein the linear rank comprises a function based on a term rank associated with each term in a document and a static rank associated with the document;
    - and wherein the at least part of the rank comprises the term rank.
  - 4. The computer-implemented method of claim 3, wherein the term rank comprises a BM25F ranking value, and wherein the static rank is based upon one or more query-independent properties.
  - 5. The computer-implemented method of claim 1, further comprising computer-implemented operations for generating a supplementary index comprising a static rank for each of the documents.
  - 6. The computer-implemented method of claim 1, further comprising computer-implemented operations for:
    - at query time, determining whether each query term in a query is common;
      
      upon determining that each query term is common, populating a document result set with at least a subset of documents from the top document list and the bottom document list that satisfy the query, each of the subset of documents occurring in the top document list for at least one term in the query;
      
      upon populating the document result set, retrieving the predefined values from the top document list;
      
      ranking the document result set according to a linear model based on the pre-computed values for documents from the top document list and zero values for documents from the bottom document list; and
      
      transmitting a reduced subset of the document result set to a ranking function adapted to re-rank the documents in the reduced subset.
  - 7. The computer-implemented method of claim 6, wherein the ranking function comprises two-layer neural network.
  - 8. The computer-implemented method of claim 6, wherein the document result set comprises documents that include every term in the query.
  - 9. The computer-implemented method of claim 6, wherein the reduced subset comprises a highest ranking subset of the document result set, and wherein the rank comprises a simple rank or a linear rank.

10. A computer-implemented method for reducing an amount of ranking data analyzed at query time, the method comprising computer-implemented operations for:
- at index time, selecting a term from a master index, the term corresponding to a number of documents greater than a threshold;
  
  selecting a set of documents that includes the term based on the master index;
  
  determining a rank for each document in the set of documents that contains the term;
  
  assigning each document in the set of documents that contains the term to a top document list for the term or a bottom document list for the term based on the rank;
  
  storing predefined values of at least part of the rank in the top document list for documents in the top document list and not storing the predefined values of at least part of the rank in the bottom document list for documents in the bottom document list;
  
  at query time, determining whether each query term in a query is common;
  
  upon determining that each query term is common, populating a document result set with at least a subset of documents from the top document list and the bottom document list that satisfy the query, each of the subset of documents occurring in the top document list for at least one term in the query;
  
  upon populating the document result set, retrieving the predefined values from the top document list;
  
  ranking the document result set according to a linear model based on the pre-computed values for documents from the top document list and zero values for documents from the bottom document list; and
  
  transmitting a reduced subset of the document result set having the highest linear rank to a ranking function adapted to re-rank the documents in the reduced subset.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The computer-implemented method of claim 10, wherein the rank comprises a linear rank.
  - 12. The computer-implemented method of claim 11, wherein the linear rank comprises a function based on a term rank associated with each term in a document and a static rank associated with the document;
    - and wherein the at least part of the rank comprises the term rank.
  - 13. The computer-implemented method of claim 12, wherein the term rank comprises a BM25F ranking value, and wherein the static rank is based upon one or more query-independent properties.
  - 14. The computer-implemented method of claim 10, wherein the ranking function comprises a two-layer neural network.
  - 15. The computer-implemented method of claim 14, wherein the two-layer neural network comprises a first layer based on a reduced feature set and a second layer based on a full feature set.
  - 16. The computer-implemented method of claim 10, further comprising generating a supplementary index comprising a static rank for each of the documents.

17. A computer-readable storage medium having stored thereon a data structure representing a key that includes a list of documents corresponding to a query term, the key being adapted to be accessed by a search engine in response to a query, the key comprising:
- a document identifier mask containing a first set of bits defining a range of documents in the list; and
  
  a bitmap containing a second set of bits indicating which of the documents from the range contain the query term, the documents corresponding to document identifiers, the document identifiers being a function of the document identifier mask and the bitmap.
- View Dependent Claims (18, 19, 20)
- - 18. The computer-readable storage medium of claim 17, wherein the key further comprises:
    - a header at a beginning of the key;
      
      a format version indicating a version of the key; and
      
      a bitmap size indicating a number of bits in the document identifier mask.
  - 19. The computer-readable storage medium of claim 17, wherein the key further comprises padding bits adapted to align the key to a boundary.
  - 20. The computer-readable storage medium of claim 17, wherein the document identifiers are sorted sequentially.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Meyerzon, Dmitriy, Tankovich, Vladimir, Petriuc, Mihai

Granted Patent

US 8,171,031 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/715
CPC Class Codes

G06F 16/951 Indexing; Web crawling tech...

INDEX OPTIMIZATION FOR RANKING USING A LINEAR MODEL

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

INDEX OPTIMIZATION FOR RANKING USING A LINEAR MODEL

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links