Efficiency of training for ranking systems based on pairwise training with aggregated gradients

US 7,617,164 B2
Filed: 03/17/2006
Issued: 11/10/2009
Est. Priority Date: 03/17/2006
Status: Active Grant

First Claim

Patent Images

1. One or more computer-readable media storing computer-executable instructions that, when executed on one or more processors, cause the one or more processors to perform acts comprising:

generating a score for each of a plurality of data items during a forward propagation process, the score generated for each data item prior to a pairwise training process that compares a pair of data items having different labels, wherein the score for each of the data items is generated only once;

comparing the data items as data item pairs for each unique combination of data items with different labels during the pairwise training process using the scores generated from the forward propagation process;

generating an aggregate gradient for the data items from the scores of the pairs of data items based on the comparing the data items, the aggregate gradient being representative of gradients calculated for each data item that results from the unique combination of the data item pairs, wherein the aggregate gradient provides a direction and a local estimate of the amount that a document should move within a ranking of the data items during a training of a learning machine; and

updating weights used by a learning machine after generating the aggregate gradient for each of the data items using a backward propagation process.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The subject disclosure pertains to systems and methods for facilitating training of machine learning systems utilizing pairwise training. The number of computations required during pairwise training is reduced by grouping the computations. First, a score is generated for each retrieved data item. During processing of the data item pairs, the scores of the data items in the pair are retrieved and used to generate a gradient for each data item. Once all of the pairs have been processed, the gradients for each data item are aggregated and the aggregated gradients are used to update the machine learning system.

121 Citations

View as Search Results

17 Claims

1. One or more computer-readable media storing computer-executable instructions that, when executed on one or more processors, cause the one or more processors to perform acts comprising:
- generating a score for each of a plurality of data items during a forward propagation process, the score generated for each data item prior to a pairwise training process that compares a pair of data items having different labels, wherein the score for each of the data items is generated only once;
  
  comparing the data items as data item pairs for each unique combination of data items with different labels during the pairwise training process using the scores generated from the forward propagation process;
  
  generating an aggregate gradient for the data items from the scores of the pairs of data items based on the comparing the data items, the aggregate gradient being representative of gradients calculated for each data item that results from the unique combination of the data item pairs, wherein the aggregate gradient provides a direction and a local estimate of the amount that a document should move within a ranking of the data items during a training of a learning machine; and
  
  updating weights used by a learning machine after generating the aggregate gradient for each of the data items using a backward propagation process.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The one or more computer-readable media method of claim 1, wherein updating weights used by a learning machine reduces a cost function of the learning machine when ranking the data items.
  - 3. The one or more computer-readable media method of claim 2, wherein the weights are used to update parameters of a network such that a subsequent forward propagation process yields a revised cost that is less than a cost from the forward propagation.
  - 4. The one or more computer-readable media of claim 2, wherein the acts further comprise obtaining internal parameters of the learning machine for each of the data items, the internal parameters are utilized to update the learning machine.
  - 5. The one or more computer-readable media of claim 4, wherein the acts farther comprise maintaining the internal parameters for each of the data items;
    - and retrieving the maintained internal parameters.
  - 6. The one or more computer-readable media of claim 1, wherein the data items includes at least one of a text file, a web page, an image, audio data, video data and a word processing file.
  - 7. The one or more computer-readable media of claim 1, wherein the learning machine is a neural network.
  - 8. The one or more computer-readable media of claim 1, wherein the acts further comprise:
    - obtaining internal parameters of the neural network during the forward propagation for score generation;
      
      maintaining the internal parameters; and
      
      utilizing the internal parameters during the backward propagation.
  - 9. The one or more computer-readable media of claim 1, wherein updating the learning machine further comprises:
    - performing a second forward propagation to obtain internal parameters of the neural network; and
      
      utilizing the internal parameters in the backward propagation.

10. A system for facilitating training of a learning machine utilizing a pairwise algorithm, comprising:
- one or more processors; and
  
  memory to store computer readable instructions executable by the processor, the memory used to store;
  
  a scorer component that generates a score for each of a plurality of data items during a forward propagation process, the score generated for each data item prior to a pairwise training process that compares a pair of data items having different labels, wherein the score for each of the data items is generated only once;
  
  a comparing component that compares the data items as data item pairs for each unique combination of data items with different labels during the pairwise training process using the scores generated from the forward propagation process;
  
  a pair processor component that generates an aggregate gradient for the data items from the scores of the pairs of data items based on the comparing the data items, the aggregate gradient being representative of gradients calculated for each data item that results from the unique combination of the data item pairs, wherein the aggregate gradient provides a direction and a local estimate of the amount that a document should move within a ranking of the data items during the training of the learning machine; and
  
  an update component that updates weights used by a learning machine after generating the gradient for each of the data items using a backward propagation process.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The system of claim 10, wherein the memory component maintains an aggregated gradient for each of the data items and the update component obtains the aggregated gradient for each of the data items.
  - 12. The system of claim 10, wherein the memory is to further store a parameter component for obtaining internal parameters of the learning machine for each of the data items, wherein the update component updates the learning machine based at least in part upon the internal parameters.
  - 13. The system of claim 10, wherein the learning system is a neural network.
  - 14. The system of claim 11, wherein the aggregate gradient is representative of the gradients calculated for each data item that results from the unique combination of the data item pairs.
  - 15. The system of claim 14, wherein the scorer component obtains internal parameters of the neural network during forward propagation, the memory component maintains the internal parameters and the update component utilizes the internal parameters during the backward propagation.
  - 16. The system of claim 14, wherein the memory is to further store a parameter component that performs a second forward propagation to obtain internal parameters of the neural network during update, and the update component utilizes the internal parameters during backward propagation.
  - 17. The system of claim 14, wherein the aggregate gradient provides a direction and a local estimate of the amount that a document should move within a ranking of the data items during a training of a learning machine.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Burges, Christopher J., Ragno, Robert J.
Primary Examiner(s)
Vincent; David R
Assistant Examiner(s)
CHANG, LI WU

Application Number

US11/378,086
Publication Number

US 20070239632A1
Time in Patent Office

1,334 Days
Field of Search

706/15, 706/48, 707/5
US Class Current

706/15
CPC Class Codes

G06N 20/00 Machine learning

Y10S 707/99935 Query augmenting and refini...

Efficiency of training for ranking systems based on pairwise training with aggregated gradients

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

121 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Efficiency of training for ranking systems based on pairwise training with aggregated gradients

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

121 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links