Efficiency of training for ranking systems based on pairwise training with aggregated gradients
First Claim
1. One or more computer-readable media storing computer-executable instructions that, when executed on one or more processors, cause the one or more processors to perform acts comprising:
- generating a score for each of a plurality of data items during a forward propagation process, the score generated for each data item prior to a pairwise training process that compares a pair of data items having different labels, wherein the score for each of the data items is generated only once;
comparing the data items as data item pairs for each unique combination of data items with different labels during the pairwise training process using the scores generated from the forward propagation process;
generating an aggregate gradient for the data items from the scores of the pairs of data items based on the comparing the data items, the aggregate gradient being representative of gradients calculated for each data item that results from the unique combination of the data item pairs, wherein the aggregate gradient provides a direction and a local estimate of the amount that a document should move within a ranking of the data items during a training of a learning machine; and
updating weights used by a learning machine after generating the aggregate gradient for each of the data items using a backward propagation process.
2 Assignments
0 Petitions
Accused Products
Abstract
The subject disclosure pertains to systems and methods for facilitating training of machine learning systems utilizing pairwise training. The number of computations required during pairwise training is reduced by grouping the computations. First, a score is generated for each retrieved data item. During processing of the data item pairs, the scores of the data items in the pair are retrieved and used to generate a gradient for each data item. Once all of the pairs have been processed, the gradients for each data item are aggregated and the aggregated gradients are used to update the machine learning system.
121 Citations
17 Claims
-
1. One or more computer-readable media storing computer-executable instructions that, when executed on one or more processors, cause the one or more processors to perform acts comprising:
-
generating a score for each of a plurality of data items during a forward propagation process, the score generated for each data item prior to a pairwise training process that compares a pair of data items having different labels, wherein the score for each of the data items is generated only once; comparing the data items as data item pairs for each unique combination of data items with different labels during the pairwise training process using the scores generated from the forward propagation process; generating an aggregate gradient for the data items from the scores of the pairs of data items based on the comparing the data items, the aggregate gradient being representative of gradients calculated for each data item that results from the unique combination of the data item pairs, wherein the aggregate gradient provides a direction and a local estimate of the amount that a document should move within a ranking of the data items during a training of a learning machine; and updating weights used by a learning machine after generating the aggregate gradient for each of the data items using a backward propagation process. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for facilitating training of a learning machine utilizing a pairwise algorithm, comprising:
-
one or more processors; and memory to store computer readable instructions executable by the processor, the memory used to store; a scorer component that generates a score for each of a plurality of data items during a forward propagation process, the score generated for each data item prior to a pairwise training process that compares a pair of data items having different labels, wherein the score for each of the data items is generated only once; a comparing component that compares the data items as data item pairs for each unique combination of data items with different labels during the pairwise training process using the scores generated from the forward propagation process; a pair processor component that generates an aggregate gradient for the data items from the scores of the pairs of data items based on the comparing the data items, the aggregate gradient being representative of gradients calculated for each data item that results from the unique combination of the data item pairs, wherein the aggregate gradient provides a direction and a local estimate of the amount that a document should move within a ranking of the data items during the training of the learning machine; and an update component that updates weights used by a learning machine after generating the gradient for each of the data items using a backward propagation process. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
Specification