Index optimization for ranking using a linear model

US 8,161,036 B2
Filed: 06/27/2008
Issued: 04/17/2012
Est. Priority Date: 06/27/2008
Status: Active Grant

First Claim

Patent Images

1. A method for reducing an amount of ranking data analyzed at query time, comprising:

at index time, selecting a term from a master index, the term corresponding to a number of documents greater than a threshold;

selecting a set of documents that includes the term based on the master index;

determining a linear rank for each document in the set of documents that contains the term, the linear rank comprising a function of a term rank associated with each term in the each document and a static rank associated with the each document;

generating a high ranking index containing a first set of documents in the set of documents that contains the term where the linear rank of the each document in the first set of documents is greater than a rank threshold, the rank threshold being different from the threshold;

generating a low ranking index containing a second set of documents in the set of documents that contains the term where the linear rank of the each document in the second set of documents is less than the rank threshold;

generating a supplementary index containing the static rank of the each document in the set of documents that contains the term; and

storing the term rank corresponding to each term-document pair in the high ranking index and the low ranking index.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Technologies are described herein for providing a more efficient approach to ranking search results. One method reduces an amount of ranking data analyzed at query time. In the method, a term is selected, at index time, from a master index. The term corresponds to a number of documents greater than a threshold. A set of documents that includes the term is selected based on the master index. A rank is determined for each document in the set of documents that contains the term. Each document in the set of documents that contains the term is assigned to a high ranking index or a low ranking index based on the simple rank.

Citations

11 Claims

1. A method for reducing an amount of ranking data analyzed at query time, comprising:
- at index time, selecting a term from a master index, the term corresponding to a number of documents greater than a threshold;
  
  selecting a set of documents that includes the term based on the master index;
  
  determining a linear rank for each document in the set of documents that contains the term, the linear rank comprising a function of a term rank associated with each term in the each document and a static rank associated with the each document;
  
  generating a high ranking index containing a first set of documents in the set of documents that contains the term where the linear rank of the each document in the first set of documents is greater than a rank threshold, the rank threshold being different from the threshold;
  
  generating a low ranking index containing a second set of documents in the set of documents that contains the term where the linear rank of the each document in the second set of documents is less than the rank threshold;
  
  generating a supplementary index containing the static rank of the each document in the set of documents that contains the term; and
  
  storing the term rank corresponding to each term-document pair in the high ranking index and the low ranking index.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the term rank comprises a BM25F ranking value, and wherein the static rank is based upon one or more query-independent properties.
  - 3. The method of claim 1, further comprising:
    - at query time, determining whether each query term in a query is common, wherein the each query term is common if the each query term is contained in either one of the high ranking index or the low ranking index;
      
      upon determining whether the each query term is common, populating a top document list with at least a subset of documents that satisfy the query, wherein the subset of documents comprises only documents in the high ranking index if the query term is common, and wherein the subset of documents comprises all documents from the master index if the query term is uncommon; and
      
      upon populating the top document list, transmitting the top document list to a ranking function configured to re-rank the documents in the top document list at the query time.
  - 4. The method of claim 3, wherein the ranking function comprises a first neural network and a second neural network, the first neural network being based on a reduced feature set, and the second neural network being based on a full feature set.
  - 5. The method of claim 3, wherein the top document list comprises documents that include every term in the query.
  - 6. The method of claim 3, wherein populating a top document list with at least a subset of documents that satisfy the query comprises selecting a highest ranking subset of documents that satisfy the query.

7. A method for reducing an amount of ranking data analyzed at query time, comprising:
- (a) at index time, selecting a term from a master index, where a number of documents containing the term is greater than a threshold;
  
  (b) selecting a set of documents that includes the term based on the master index;
  
  (c) determining a linear rank for each document in the set of documents that contains the term, the linear rank comprising a function of a term rank associated with each term in the each document and a static rank associated with the each document;
  
  (d) populating a high ranking index containing a first set of documents in the set of documents that contains the term where the linear rank of the each document in the first set of documents is greater than a rank threshold, the rank threshold being different from the threshold;
  
  (e) populating a low ranking index containing a second set of documents in the set of documents that contains the term where the linear rank of the each document in the second set of documents is less than the rank threshold;
  
  (f) repeating operations (b)-(e) during the index time for additional terms from the master index, where the number of documents containing each of the additional terms is greater than the threshold;
  
  (g) generating a supplementary index containing the static rank of the each document in the set of documents that contains the term;
  
  (h) storing the term rank corresponding to each term-document pair in the high ranking index and the low ranking index;
  
  (i) at query time, determining whether each query term in a query is common, wherein the each query term is common if the each query term is contained in either one of the high ranking index or the low ranking index, wherein the each query term is uncommon if the each query term is not contained in both the high ranking index and the low ranking index;
  
  (j) upon determining whether the each query term is common, populating a top document list with at least a subset of documents that satisfy the query, wherein the subset of documents comprises only documents in the high ranking index if the query term is common, and wherein the subset of documents comprises all documents from the master index if the query term is uncommon; and
  
  (k) upon populating the top document list, transmitting the top document list to a ranking function configured to re-rank the documents in the top document list.
- View Dependent Claims (8, 9, 10, 11)
- - 8. The method of claim 7, wherein the term rank comprises a BM25F ranking value, and wherein the static rank is based upon one or more query-independent properties.
  - 9. The method of claim 7, wherein the ranking function comprises a first neural network and a second neural network.
  - 10. The method of claim 9, wherein the first neural network is based on a reduced feature set, and the second neural network is based on a full feature set.
  - 11. The method of claim 9, wherein the first neural network re-ranks the subset of documents based on information including the static rank provided by the supplementary index.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Tankovich, Vladimir, Meyerzon, Dmitriy, Petriuc, Mihai
Primary Examiner(s)
Vy, Hung T
Assistant Examiner(s)
Cao, Phuong Thao

Application Number

US12/147,666
Publication Number

US 20090327266A1
Time in Patent Office

1,390 Days
Field of Search

707/999.005, 707/713, 707/715, 707/741, 707/753
US Class Current

707/715
CPC Class Codes

G06F 16/3331 Query processing

Index optimization for ranking using a linear model

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Index optimization for ranking using a linear model

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links