Learning a document ranking function using fidelity-based error measurements

US 7,805,438 B2
Filed: 07/31/2006
Issued: 09/28/2010
Est. Priority Date: 07/31/2006
Status: Active Grant

First Claim

Patent Images

1. A method in a computing device for determining loss between a target probability and a model probability for documents when training a ranking function based on training data, the training data including documents and the target probability of relative relevance of pairs of documents to queries, the model probability being generated by a ranking function that ranks documents, the method comprising:

training the ranking function by repeating the following until a calculated loss is below a threshold loss;

selecting a new ranking function by modifying a previous ranking function to reduce the calculated loss;

applying the new ranking function to the pairs of documents of the training data to provide new rankings of the documents based on the queries;

calculating by the computing device a model probability from the new rankings of the documents; and

calculating by the computing device a loss between the calculated model probability and the target probability to indicate a difference between the new ranking of a pair of documents represented by the calculated model probability and a ranking of the pair of documents represented by the target probability, the loss varying between 0 and 1 and the loss being 0 when the calculated model probability is the same as the target probabilitywherein the calculated loss is a fidelity loss andwherein the fidelity-based loss is represented by the following equation;

$F_{ij} = 1 - (\sqrt{P_{ij}^{*} \cdot P_{ij}} + \sqrt{(1 - P_{ij}^{*}) \cdot (1 - P_{ij})})$ where F_ijrepresents the fidelity loss, P_ij* represents the target probability for documents i and j, and P_ijrepresents the calculated model probability for documents i and j.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for generating a ranking function using a fidelity-based loss between a target probability and a model probability for a pair of documents is provided. A fidelity ranking system generates a fidelity ranking function that ranks the relevance of documents to queries. The fidelity ranking system operates to minimize a fidelity loss between pairs of documents of training data. The fidelity loss may be derived from “fidelity” as used in the field of quantum physics. The fidelity ranking system may use a learning technique in conjunction with a fidelity loss when generating the ranking function. After the fidelity ranking system generates the fidelity ranking function, it uses the fidelity ranking function to rank the relevance of documents to queries.

56 Citations

View as Search Results

19 Claims

1. A method in a computing device for determining loss between a target probability and a model probability for documents when training a ranking function based on training data, the training data including documents and the target probability of relative relevance of pairs of documents to queries, the model probability being generated by a ranking function that ranks documents, the method comprising:
- training the ranking function by repeating the following until a calculated loss is below a threshold loss;
  
  selecting a new ranking function by modifying a previous ranking function to reduce the calculated loss;
  
  applying the new ranking function to the pairs of documents of the training data to provide new rankings of the documents based on the queries;
  
  calculating by the computing device a model probability from the new rankings of the documents; and
  
  calculating by the computing device a loss between the calculated model probability and the target probability to indicate a difference between the new ranking of a pair of documents represented by the calculated model probability and a ranking of the pair of documents represented by the target probability, the loss varying between 0 and 1 and the loss being 0 when the calculated model probability is the same as the target probabilitywherein the calculated loss is a fidelity loss andwherein the fidelity-based loss is represented by the following equation;
  
  $F_{ij} = 1 - (\sqrt{P_{ij}^{*} \cdot P_{ij}} + \sqrt{(1 - P_{ij}^{*}) \cdot (1 - P_{ij})})$ where F_ijrepresents the fidelity loss, P_ij* represents the target probability for documents i and j, and P_ijrepresents the calculated model probability for documents i and j.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1 wherein the applying of the ranking function, the calculating of a model probability, and the calculating of a loss are performed when generating a ranking function.
  - 3. The method of claim 2 wherein the generating of a ranking function uses an adaptive boosting technique.
  - 4. The method of claim 2 wherein the generating of a ranking function uses a neural network technique.
  - 5. The method of claim 2 wherein the generating of a ranking function uses a support vector machine technique.
  - 6. The method of claim 1 wherein the calculating of a model probability applies a logistic function to the ranking of the documents.

7. A method in a computing device for determining loss between a target probability and a model probability for a pair of documents, the model probability being generated by a ranking function that ranks documents, the method comprising:
- applying the ranking function to the pair of documents to provide rankings of the documents;
  
  calculating a model probability from the rankings of the documents; and
  
  calculating by the computing device a fidelity loss between the calculated model probability and the target probability, the fidelity loss varying between 0 and 1 and the loss being 0 when the calculated model probability is the same as the target probabilitywherein the calculating of a model probability applies a logistic function to the ranking of the documentswherein the logistic function is represented by the following equation;
  
  $P_{ij} = \frac{ⅇ^{o_{ij}}}{1 + ⅇ^{o_{ij}}}$ where P_ijrepresents the probability that document i is ranked higher than document j and o_ijrepresents the difference between outputs of a fidelity ranking function for document i and document j as represented by f(d_i)−
  
  f(d_j) with f(d_i) being the output of the fidelity ranking function for document i.

8. A computing device for generating a ranking function for documents, the ranking function indicating a ranking of documents based on relevance of the documents to a query, the system comprising:
- a processor; and
  
  a memory with computer-executable instructions that implementa component that provides features of documents and indications of target probabilities of relative rankings of the relevance of pairs of documents to queries;
  
  a component that calculates a fidelity loss between a model probability and a target probability for a pair of documents, the probabilities indicating a probability of relative ranking of the documents of the pair; and
  
  a component that generates the ranking function by operating to minimize the fidelity loss between the model probabilities derived from the ranking of documents and the target probabilitieswherein the model probability is derived by applying a logistic function to the ranking of the documents,wherein the logistic function is represented by the following equation;
  
  $P_{ij} = \frac{ⅇ^{o_{ij}}}{1 + ⅇ^{o_{ij}}}$ where P_ijrepresents the probability that document i is ranked higher than document j and o_ijrepresents the difference between outputs of a fidelity ranking function for document i and document j as represented by f(d_i)−
  
  f(d_j) with f(d_i) being the output of the fidelity ranking function for document, i and wherein the fidelity loss is represented by the following equation;
  
  $F_{ij} = 1 - (\sqrt{P_{ij}^{*} \cdot P_{ij}} + \sqrt{(1 - P_{ij}^{*}) \cdot (1 - P_{ij})})$ where F_ijrepresents the fidelity loss, P_ij* represents the target probability for documents i and j, and P_ijrepresents the calculated model probability for documents i and j.
- View Dependent Claims (9, 11, 12, 13)
- - 9. The computing device of claim 8 wherein the fidelity loss varies between 0 and 1 and the fidelity loss is 0 when the model probability is the same as the target probability.
  - 11. The computing device of claim 8 wherein the component that generates a ranking function uses an adaptive boosting technique.
  - 12. The computing device of claim 8 wherein the component that generates a ranking function uses a neural network technique.
  - 13. The computing device of claim 8 wherein the component that generates a ranking function uses a support vector machine technique.

10. A computing device for generating a ranking function for documents, the ranking function indicating a ranking of documents based on relevance of the documents to a query, the system comprising:
- a processor; and
  
  a memory with computer-executable instructions that implementa component that provides features of documents and indications of target probabilities of relative rankings of the relevance of pairs of documents to queries;
  
  a component that calculates a fidelity loss between a model probability and a target probability for a pair of documents, the probabilities indicating a probability of relative ranking of the documents of the pair; and
  
  a component that generates the ranking function by operating to minimize the fidelity loss between the model probabilities derived from the ranking of documents and the target probabilitieswherein the fidelity loss varies between 0 and 1 and the fidelity loss is 0 when the model probability is the same as the target probability andwherein the fidelity-based loss is represented by the following equation;
  
  $F_{ij} = 1 - (\sqrt{P_{ij}^{*} \cdot P_{ij}} + \sqrt{(1 - P_{ij}^{*}) \cdot (1 - P_{ij})})$ where F_ijrepresents the fidelity loss, P_ij* represents the target probability for documents i and j, and P_ijrepresents the calculated model probability for documents i and j.

14. A computing device for determining loss between a target probability and a model probability for documents when training a ranking function based on training data, the training data including documents and the target probability of relative relevance of pairs of documents to queries, the model probability being generated by a ranking function that ranks documents, comprising:
- a memory storing computer-executable instructions of;
  
  a component that trains the ranking function by repeating the following until a calculated loss is below a threshold loss;
  
  selecting a new ranking function by modifying a previous ranking function to reduce the calculated loss;
  
  applying the new ranking function to the pairs of documents of the training data to provide new rankings of the documents based on the queries;
  
  calculating by the computing device a model probability from the new rankings of the documents; and
  
  calculating by the computing device a loss between the calculated model probability and the target probability to indicate a difference between the new ranking of a pair of documents represented by the calculated model probability and a ranking of the pair of documents represented by the target probability, the loss varying between 0 and 1 and the loss being 0 when the calculated model probability is the same as the target probability; and
  
  a processor for executing the computer-executable instructions stored in the memorywherein the calculated loss is a fidelity loss andwherein the fidelity-based loss is represented by the following equation;
  
  F_ij=1−
  
  (√
  
  {square root over (P_ij*·
  
  P_ij)}+√
  
  {square root over ((1−
  
  P_ij*)·
  
  (1−
  
  P_ij))}{square root over ((1−
  
  P_ij*)·
  
  (1−
  
  P_ij))})where F_ijrepresents the fidelity loss, P_ij* represents the target probability for documents i and j, and P_ijrepresents the calculated model probability for documents i and j.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The computing device of claim 14 wherein the applying of the ranking function, the calculating of a model probability, and the calculating of a loss are performed when generating a ranking function.
  - 16. The computing device of claim 15 wherein the generating of a ranking function uses an adaptive boosting technique.
  - 17. The computing device of claim 15 wherein the generating of a ranking function uses a neural network technique.
  - 18. The computing device of claim 15 wherein the generating of a ranking function uses a support vector machine technique.
  - 19. The computing device of claim 14 wherein the calculating of a model probability applies a logistic function to the ranking of the documents.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Ma, Wei-Ying, Liu, Tie-Yan, Tsai, Ming-Feng
Primary Examiner(s)
Jalil; Neveen Abel
Assistant Examiner(s)
MINCEY, JERMAINE A

Application Number

US11/461,404
Publication Number

US 20080027912A1
Time in Patent Office

1,520 Days
Field of Search

707/3, 702/19
US Class Current

707/723
CPC Class Codes

G06F 16/3346 using probabilistic model

Learning a document ranking function using fidelity-based error measurements

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

56 Citations

19 Claims

Specification

Use Cases

Quick Links

Others

Learning a document ranking function using fidelity-based error measurements

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

56 Citations

19 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others