Ranking functions using an incrementally-updatable, modified naive bayesian query classifier

US 20080028010A1
Filed: 07/31/2006
Published: 01/31/2008
Est. Priority Date: 07/31/2006
Status: Active Grant

First Claim

Patent Images

1. A computer readable medium having stored thereon computer-executable instructions for ranking documents on a network in response to a user inputted search query comprising one or more search query terms, said computer-executable instructions utilizing an incrementally-updatable query classifier model that can be updated by updating count values #(Asset), #(w_i, Asset) and Σ

#(w_i, Asset), wherein #(Asset) represents a number of times that a given document on the network is selected for viewing by any user, #(w_i, Asset) represents a number of times that a given document on the network and a search query term, w_i, of the search query are matched by any user, and Σ

#(w_i, Asset) represents a sum of the number of times that a given document on the network and any search query term, w_i, of the search query are matched by any user.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods of ranking documents on a network using an incrementally-updatable system are disclosed. Computer readable medium having stored thereon computer-executable instructions for performing a method of ranking documents on a network using an incrementally-updatable system are also disclosed. Further, computing systems containing at least one application module, wherein the at least one application module comprises application code for performing methods of ranking documents on a network using an incrementally-updatable system are disclosed.

49 Citations

View as Search Results

20 Claims

1. A computer readable medium having stored thereon computer-executable instructions for ranking documents on a network in response to a user inputted search query comprising one or more search query terms, said computer-executable instructions utilizing an incrementally-updatable query classifier model that can be updated by updating count values #(Asset), #(w_i, Asset) and Σ
- #(w_i, Asset), wherein #(Asset) represents a number of times that a given document on the network is selected for viewing by any user, #(w_i, Asset) represents a number of times that a given document on the network and a search query term, w_i, of the search query are matched by any user, and Σ
  
  #(w_i, Asset) represents a sum of the number of times that a given document on the network and any search query term, w_i, of the search query are matched by any user.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The computer readable medium of claim 1, wherein each document on the network is ranked based on its relevance to the search query and is provided a document relevance score using formula (I):
    - $\begin{matrix} \begin{matrix} \log [P (Asset | Query)] = \log [# (Asset)] - \log [# (T)] + \sum_{i = 1}^{N_{Q}} \\ (\log [# (w_{i}, Asset)] - \log [# (Asset)] + \\ \frac{# (w_{i}, Asset)}{# (Asset)}) - \sum_{i = 1}^{V} \frac{# (w_{i}, Asset)}{# (Asset)} \end{matrix} & (I) \end{matrix}$ wherein;
      
      P(Asset|Query) represents a probability of returning a given document, Asset, given a particular user inputted search query, Query;
      
      N_Qis the number of terms in the search query;
      
      V is the size of the vocabulary of the network; and
      
      #(T) is the total number of search queries that have been processed by any user.
  - 3. The computer readable medium of claim 1, wherein the incrementally-updatable query classifier model is updated at an end of each time period, said time period being equal to or less than 24 hours in length.
  - 4. The computer readable medium of claim 3, wherein updating count values #(Asset), #(w_i, Asset) and Σ
    - #(w_i, Asset) comprises adding any new data that has been collected during said time period to previously stored count values #(Asset)(old), #(w_i, Asset)(old) and Σ
      
      #(w_i, Asset)(old).
  - 5. The computer readable medium of claim 1, wherein the incrementally-updatable query classifier model further comprises a time-decay component, wherein recent search queries and user responses are given more weight than past search queries and user responses.
  - 6. The computer readable medium of claim 5, wherein:
    - $# (Asset) = \sum_{t = 0}^{\infty} λ^{t} [# (Asset) (t)];$ $# (w_{i}, Asset) = \sum_{t = 0}^{\infty} λ^{t} [# (w_{i}, Asset) (t)]; and$ $Σ # (w_{i}, Asset) = \sum_{t = 0}^{\infty} λ^{t} [Σ # (w_{i}, Asset) (t)];$ wherein;
      
      λ
      
      is a weighing multiplier having a value of less than 1.0; and
      
      t is an integer representing an age of a count value component.
  - 7. The computer readable medium of claim 6, wherein updating count values #(Asset), #(w_i, Asset) and Σ
    - #(w_i, Asset) comprises recalculating #(Asset), #(w_i, Asset) and Σ
      
      #(w_i, Asset) as follows;
      
      #(Asset)(new)=#(Asset)(0)+λ
      
      [#(Asset)(old)];
      
      #(w_i, Asset)(new)=#(w_i, Asset)(0)+λ
      
      #(w_i, Asset)(old)]; and
      
      Σ
      
      #(w_i, Asset)(new)=Σ
      
      #(w_i, Asset)(0)+λ
      
      [Σ
      
      #(w_i, Asset)(old)];
      
      wherein;
      
      #(Asset)(new), #(w_i, Asset)(new) and Σ
      
      #(w_i, Asset)(new) each independently represent incrementally updated values for count values #(Asset), #(w_i, Asset) and Σ
      
      #(w_i, Asset) respectively;
      
      #(Asset)(0), #(w_i, Asset)(0) and Σ
      
      #(w_i, Asset)(0) each independently represent a number of occurrences within a last time period respectively; and
      
      #(Asset)(old), #(w_i, Asset)(old) and Σ
      
      #(w_i, Asset)(old) each independently represent cumulative count values prior to the last time period respectively.
  - 8. The computer readable medium of claim 1, further comprising computer-executable instructions for accepting the search inquiry inputted by a user, conducting a search of the documents on the network to generate search results comprising multiple documents, ranking the multiple documents of the search results using the incrementally-updatable query classifier model to generate ranked search results, and displaying the ranked search results to the user.
  - 9. A computing system containing at least one application module usable on the computing system, wherein the at least one application module comprises application code loaded thereon from the computer readable medium of claim 1.

10. A method of incrementally updating a query classifier model suitable for use as a ranking function component in a search engine, said method comprising:
- determining count values #(Asset), #(w_i, Asset) and Σ
  
  #(w_i, Asset), wherein #(Asset) represents a number of times that a given document on the network is selected for viewing by any user, #(w_i, Asset) represents a number of times that a given document on the network and a search query term, w_i, of the search query are matched by any user, and Σ
  
  #(w_i, Asset) represents a sum of the number of times that a given document on the network and any search query term, w_i, of the search query are matched by any user;
  
  storing the count values #(Asset), #(w_i, Asset) and Σ
  
  #(w_i, Asset); and
  
  updating the stored count values by adding any new data collected during a time period to the previously stored count values #(Asset), #(w_i, Asset) and Σ
  
  #(w_i, Asset).
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The method of claim 10, wherein updating stored count values #(Asset), #(w_i, Asset) and Σ
    - #(w_i, Asset) comprises recalculating #(Asset), #(w_i, Asset) and Σ
      
      #(w_i, Asset) as follows;
      
      #(Asset)(new)=#(Asset)(0)+λ
      
      [#(Asset)(old)];
      
      #(w_i, Asset)(new)=#(w_i, Asset)(0)+λ
      
      #(w_i, Asset)(old)]; and
      
      Σ
      
      #(w_i, Asset)(new)=Σ
      
      #(w_i, Asset)(0)+λ
      
      [Σ
      
      #(w_i, Asset)(old)];
      
      wherein;
      
      #(Asset)(new), #(w_i, Asset)(new) and Σ
      
      #(w_i, Asset)(new) represent incrementally updated values for count values #(Asset), #(w_i, Asset) and Σ
      
      #(w_i, Asset) respectively;
      
      #(Asset)(0), #(w_i, Asset)(0) and Σ
      
      #(w_i, Asset)(0) represent a number of occurrences within a last time period respectively;
      
      #(Asset)(old), #(w_i, Asset)(old) and Σ
      
      #(w_i, Asset)(old) represent cumulative count values prior to the last time period respectively; and
      
      λ
      
      is a weighing multiplier.
  - 13. The method of claim 12, wherein λ
    - has a value of less than 1.0.
  - 14. A method of determining a document relevance score for a document on a network relative to a user inputted search query, said method comprising the method of claim 10.
  - 15. The method of claim 14, wherein the document relevance score is determined using formula (I):
    - $\begin{matrix} \begin{matrix} \log [P (Asset | Query)] = \log [# (Asset)] - \log [# (T)] + \sum_{i = 1}^{N_{Q}} \\ (\log [# (w_{i}, Asset)] - \log [# (Asset)] + \\ \frac{# (w_{i}, Asset)}{# (Asset)}) - \sum_{i = 1}^{V} \frac{# (w_{i}, Asset)}{# (Asset)} \end{matrix} & (I) \end{matrix}$ wherein;
      
      P(Asset|Query) represents a probability of returning a given document, Asset, given a particular user inputted search query, Query;
      
      N_Qis the number of terms in the search query;
      
      V is the size of the vocabulary of the network; and
      
      #(T) is the total number of search queries that have been processed by any user.
  - 16. A method of ranking search results of a search query, said method comprising the steps of:
    - determining a document relevance score for each document of a network using the method of claim 14; and
      
      ranking the documents in descending order based on the document relevance scores of each document.
  - 17. A computer readable medium having stored thereon computer-executable instructions for performing the method of claim 10.

11. The method of claim 11, wherein the time period is equal to or less than 24 hours in length.

18. A computing system containing at least one application module usable on the computing system, wherein the at least one application module comprises application code for performing a method of ranking documents on a network based on document relevance to a user inputted search query, said method comprising the steps of:
- utilizing formula (I) to determine a document relevance score for each document; and
  
  ranking documents in descending order based on the document relevance score for each document;
  
  wherein formula (I) comprises $\begin{matrix} \begin{matrix} \log [P (Asset | Query)] = \log [# (Asset)] - \log [# (T)] + \sum_{i = 1}^{N_{Q}} \\ (\log [# (w_{i}, Asset)] - \log [# (Asset)] + \\ \frac{# (w_{i}, Asset)}{# (Asset)}) - \sum_{i = 1}^{V} \frac{# (w_{i}, Asset)}{# (Asset)} \end{matrix} & (I) \end{matrix}$ wherein;
  
  P(Asset|Query) represents a probability of returning a given document, Asset, given a particular user inputted search query, Query;
  
  N_Qis the number of terms in the search query;
  
  V is the size of the vocabulary of the network;
  
  #(T) is the total number of search queries that have been processed by any user;
  
  #(Asset) represents a number of times that a given document on the network is selected for viewing by any user;
  
  #(w_i, Asset) represents a number of times that a given document on the network and a search query term, w_i, of the search query are matched by any user; and
  
  Σ
  
  #(w_i, Asset) represents a sum of the number of times that a given document on the network and any search query term, w_i, of the search query are matched by any user.
- View Dependent Claims (19, 20)
- - 19. The computing system of claim 18, wherein count values #(Asset), #(w_i, Asset) and Σ
    - #(w_i, Asset) are incrementally updatable, and are represented by;
20. The computing system of claim 19, wherein λ
- is less than 1.0.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Ramsey, William D.

Granted Patent

US 7,620,634 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/205
CPC Class Codes

G06F 16/33   Querying

G06F 16/903   Querying for retrieval from...

Y10S 707/99937   Sorting

Ranking functions using an incrementally-updatable, modified naive bayesian query classifier

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

49 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Ranking functions using an incrementally-updatable, modified naive bayesian query classifier

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

49 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links