Method of generating a training object for training a machine learning algorithm
First Claim
1. A computer implemented method of generating a training object for training a machine learning algorithm, the training object including a digital training document and an assigned label, the method executable at a training server, the method comprising:
- acquiring the digital training document to be used in the training;
transmitting, via a communication network, the digital training document to a plurality of assessors, transmitting further including indicating a range of possible labels for the assessors to assess from the range of possible labels including at least a first possible label and a second possible label;
obtaining from each of the plurality of assessors a selected label to form a pool of selected labels;
generating a consensus label distribution based on the pool of selected labels, the consensus label distribution representing a range of perceived labels for the digital training document and an associated probability score for each of the perceived labels;
the consensus label distribution being generated by aggregating an assessor-specific perceived label distribution for each assessor of the plurality of assessors, wherein;
the assessor-specific perceived label distribution for a given assessor of the plurality of assessors, is determined by;
determining, for each of the range of possible labels, an assessor-inherent probability score, the assessor-inherent probability score for a given one of the range of possible labels being indicative of the probability of the given one of the range of possible labels being selected by the given assessor;
determining, for each of the range of possible labels, a conditional probability score, the conditional probability score for a given one of the range of possible labels being indicative of the probability of the given one of the range of possible labels being perceived as a most relevant label to the digital training document by the given assessor despite the given assessor having selected a different one of the range of possible labels; and
obtaining the assessor-specific perceived label distribution by aggregating, for each of the range of possible labels, the assessor-inherent probability score and the conditional probability score for the given assessor;
training the machine learning algorithm using the digital training document and the consensus label distribution.
4 Assignments
0 Petitions
Accused Products
Abstract
There is disclosed a computer implemented method of generating a training object for training a machine learning algorithm (MLA). The method comprises: acquiring a digital training document to be used in the training; transmitting the digital training document to a plurality of assessors, transmitting further including indicating a range of possible labels for the assessors to assess from, the range of possible labels including at least a first possible label and a second possible label; obtaining from each of the plurality of assessors a selected label to form a pool of selected labels; generating a consensus label distribution based on the pool of selected labels, the consensus label distribution representing a range of perceived labels for the digital training document and an associated probability score for each of the perceived labels; and training the machine learning algorithm using the digital training document and the consensus label distribution.
-
Citations
17 Claims
-
1. A computer implemented method of generating a training object for training a machine learning algorithm, the training object including a digital training document and an assigned label, the method executable at a training server, the method comprising:
-
acquiring the digital training document to be used in the training; transmitting, via a communication network, the digital training document to a plurality of assessors, transmitting further including indicating a range of possible labels for the assessors to assess from the range of possible labels including at least a first possible label and a second possible label; obtaining from each of the plurality of assessors a selected label to form a pool of selected labels; generating a consensus label distribution based on the pool of selected labels, the consensus label distribution representing a range of perceived labels for the digital training document and an associated probability score for each of the perceived labels; the consensus label distribution being generated by aggregating an assessor-specific perceived label distribution for each assessor of the plurality of assessors, wherein; the assessor-specific perceived label distribution for a given assessor of the plurality of assessors, is determined by; determining, for each of the range of possible labels, an assessor-inherent probability score, the assessor-inherent probability score for a given one of the range of possible labels being indicative of the probability of the given one of the range of possible labels being selected by the given assessor; determining, for each of the range of possible labels, a conditional probability score, the conditional probability score for a given one of the range of possible labels being indicative of the probability of the given one of the range of possible labels being perceived as a most relevant label to the digital training document by the given assessor despite the given assessor having selected a different one of the range of possible labels; and obtaining the assessor-specific perceived label distribution by aggregating, for each of the range of possible labels, the assessor-inherent probability score and the conditional probability score for the given assessor; training the machine learning algorithm using the digital training document and the consensus label distribution. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A training server for training a ranking application, the ranking application for ranking search results, the training server comprising:
-
a network interface for communicatively coupling to a communication network; a processor coupled to the network interface, the processor configured to; acquire the digital training document to be used in the training; transmit, via the communication network, the digital training document to a plurality of assessors, transmitting further including indicating a range of possible labels for the assessors to assess from the range of possible labels including at least a first possible label and a second possible label; obtain from each of the plurality of assessors a selected label to form a pool of selected labels; generate a consensus label distribution based on the pool of selected labels, the consensus label distribution representing a range of perceived labels for the digital training document and an associated probability score for each of the perceived labels; the consensus label distribution being generated by aggregating an assessor-specific perceived label distribution for each assessor of the plurality of assessors, wherein; the assessor-specific perceived label distribution for a given assessor of the plurality of assessors, is determined by;
determining, for each of the range of possible labels, an assessor-inherent probability score, the assessor-inherent probability score for a given one of the range of possible labels being indicative of the probability of the given one of the range of possible labels being selected by the given assessor;
determining, for each of the range of possible labels, a conditional probability score, the conditional probability score for a given one of the range of possible labels being indicative of the probability of the given one of the range of possible labels being perceived as a most relevant label to the digital training document by the given assessor despite the given assessor having selected a different one of the range of possible labels; and
obtaining the assessor-specific perceived label distribution by aggregating, for each of the range of possible labels, the assessor-inherent probability score and the conditional probability score for the given assessor;train the machine learning algorithm using the digital training document and the consensus label distribution. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer implemented method of generating a training object for training a machine learning algorithm, the training object including a digital training document and an assigned label, the method executable at a training server, the method comprising:
-
acquiring the digital training document to be used in the training; transmitting, via a communication network, the digital training document to a plurality of assessors, transmitting further including indicating a range of possible labels for the assessors to assess from the range of possible labels including at least a first possible label and a second possible label; obtaining from each of the plurality of assessors a selected label to form a pool of selected labels; determining an expertise parameter for each of the plurality of assessors based on the pool of selected labels, wherein; the expertise parameter is independent of the digital training document assessed; determining a difficulty parameter of the digital training document based on the pool of selected labels, wherein; the difficulty parameter is independent of any assessor assessing the digital training document; generating a consensus label distribution based on the pool of selected labels, the consensus label distribution representing a range of perceived labels for the digital training document and an associated probability score for each of the perceived labels, wherein the consensus label distribution is determined by; determining an assessor-specific perceived label distribution for each assessor of the plurality of assessors; aggregating each of the assessor-specific perceived label distribution of the plurality of assessors, wherein the assessor-specific perceived label distribution for a given assessor is determined by; for the first possible label; determining an assessor-inherent probability score of the first possible label being selected by the given assessor; determining a conditional probability score based at least on the expertise parameter and the difficulty parameter, the conditional probability score representing the probability of the selected label provided the given assessor perceived the first possible label as a most relevant label to the digital training document; and aggregating the assessor-inherent probability score and the conditional probability score to obtain a first label specific perceived score; for the second possible label; determining the assessor-inherent probability score of the second possible label being selected by the given assessor; determining the conditional probability score based at least on the expertise parameter and the difficulty parameter, the conditional probability score representing the probability of the selected label provided the given assessor perceived the second possible label as the most relevant label to the digital training document; aggregating the assessor-inherent probability score and the conditional probability score to obtain a second label specific perceived score; aggregating the first label specific perceived score and the second label specific perceived score; training the machine learning algorithm using the digital training document and the consensus label distribution.
-
Specification