Method of generating a training object for training a machine learning algorithm

US 10,445,379 B2
Filed: 05/29/2017
Issued: 10/15/2019
Est. Priority Date: 06/20/2016
Status: Active Grant

First Claim

Patent Images

1. A computer implemented method of generating a training object for training a machine learning algorithm, the training object including a digital training document and an assigned label, the method executable at a training server, the method comprising:

acquiring the digital training document to be used in the training;

transmitting, via a communication network, the digital training document to a plurality of assessors, transmitting further including indicating a range of possible labels for the assessors to assess from the range of possible labels including at least a first possible label and a second possible label;

obtaining from each of the plurality of assessors a selected label to form a pool of selected labels;

generating a consensus label distribution based on the pool of selected labels, the consensus label distribution representing a range of perceived labels for the digital training document and an associated probability score for each of the perceived labels;

the consensus label distribution being generated by aggregating an assessor-specific perceived label distribution for each assessor of the plurality of assessors, wherein;

the assessor-specific perceived label distribution for a given assessor of the plurality of assessors, is determined by;

determining, for each of the range of possible labels, an assessor-inherent probability score, the assessor-inherent probability score for a given one of the range of possible labels being indicative of the probability of the given one of the range of possible labels being selected by the given assessor;

determining, for each of the range of possible labels, a conditional probability score, the conditional probability score for a given one of the range of possible labels being indicative of the probability of the given one of the range of possible labels being perceived as a most relevant label to the digital training document by the given assessor despite the given assessor having selected a different one of the range of possible labels; and

obtaining the assessor-specific perceived label distribution by aggregating, for each of the range of possible labels, the assessor-inherent probability score and the conditional probability score for the given assessor;

training the machine learning algorithm using the digital training document and the consensus label distribution.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

There is disclosed a computer implemented method of generating a training object for training a machine learning algorithm (MLA). The method comprises: acquiring a digital training document to be used in the training; transmitting the digital training document to a plurality of assessors, transmitting further including indicating a range of possible labels for the assessors to assess from, the range of possible labels including at least a first possible label and a second possible label; obtaining from each of the plurality of assessors a selected label to form a pool of selected labels; generating a consensus label distribution based on the pool of selected labels, the consensus label distribution representing a range of perceived labels for the digital training document and an associated probability score for each of the perceived labels; and training the machine learning algorithm using the digital training document and the consensus label distribution.

Citations

17 Claims

1. A computer implemented method of generating a training object for training a machine learning algorithm, the training object including a digital training document and an assigned label, the method executable at a training server, the method comprising:
- acquiring the digital training document to be used in the training;
  
  transmitting, via a communication network, the digital training document to a plurality of assessors, transmitting further including indicating a range of possible labels for the assessors to assess from the range of possible labels including at least a first possible label and a second possible label;
  
  obtaining from each of the plurality of assessors a selected label to form a pool of selected labels;
  
  generating a consensus label distribution based on the pool of selected labels, the consensus label distribution representing a range of perceived labels for the digital training document and an associated probability score for each of the perceived labels;
  
  the consensus label distribution being generated by aggregating an assessor-specific perceived label distribution for each assessor of the plurality of assessors, wherein;
  
  the assessor-specific perceived label distribution for a given assessor of the plurality of assessors, is determined by;
  
  determining, for each of the range of possible labels, an assessor-inherent probability score, the assessor-inherent probability score for a given one of the range of possible labels being indicative of the probability of the given one of the range of possible labels being selected by the given assessor;
  
  determining, for each of the range of possible labels, a conditional probability score, the conditional probability score for a given one of the range of possible labels being indicative of the probability of the given one of the range of possible labels being perceived as a most relevant label to the digital training document by the given assessor despite the given assessor having selected a different one of the range of possible labels; and
  
  obtaining the assessor-specific perceived label distribution by aggregating, for each of the range of possible labels, the assessor-inherent probability score and the conditional probability score for the given assessor;
  
  training the machine learning algorithm using the digital training document and the consensus label distribution.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, the method further comprising:
    - determining an expertise parameter for each of the plurality of assessors based on the pool of selected labels; and
      
      determining a difficulty parameter of the digital training document based on the pool of selected labels.
  - 3. The method of claim 2, wherein:
    - the expertise parameter is independent of the digital training document assessed; and
      
      the difficulty parameter is independent of any assessor assessing the digital training document.
  - 4. The method of claim 1, whereinthe conditional probability score is determined, for each of the range of possible labels, based at least on the expertise parameter and the difficulty parameter.
  - 5. The method of claim 1, wherein the assessor-inherent probability score is determined based at least on a given assessor'"'"'s assessor-specific tendency parameter.
  - 6. The method of claim 5, further comprising determining the given assessor'"'"'s assessor-specific tendency parameter based at least on the given assessor'"'"'s assessing history.
  - 7. The method of claim 1, wherein the machine learning algorithm is executed by a ranking application of a search ranker server, and wherein the training is based on a target of improving the accuracy of the machine learning algorithm.
  - 8. The method of claim 7, wherein improving the accuracy represents improving a relevancy of a search result in response to a search request.

9. A training server for training a ranking application, the ranking application for ranking search results, the training server comprising:
- a network interface for communicatively coupling to a communication network;
  
  a processor coupled to the network interface, the processor configured to;
  
  acquire the digital training document to be used in the training;
  
  transmit, via the communication network, the digital training document to a plurality of assessors, transmitting further including indicating a range of possible labels for the assessors to assess from the range of possible labels including at least a first possible label and a second possible label;
  
  obtain from each of the plurality of assessors a selected label to form a pool of selected labels;
  
  generate a consensus label distribution based on the pool of selected labels, the consensus label distribution representing a range of perceived labels for the digital training document and an associated probability score for each of the perceived labels;
  
  the consensus label distribution being generated by aggregating an assessor-specific perceived label distribution for each assessor of the plurality of assessors, wherein;
  
  the assessor-specific perceived label distribution for a given assessor of the plurality of assessors, is determined by;
  
  determining, for each of the range of possible labels, an assessor-inherent probability score, the assessor-inherent probability score for a given one of the range of possible labels being indicative of the probability of the given one of the range of possible labels being selected by the given assessor;
  
  determining, for each of the range of possible labels, a conditional probability score, the conditional probability score for a given one of the range of possible labels being indicative of the probability of the given one of the range of possible labels being perceived as a most relevant label to the digital training document by the given assessor despite the given assessor having selected a different one of the range of possible labels; and
  
  obtaining the assessor-specific perceived label distribution by aggregating, for each of the range of possible labels, the assessor-inherent probability score and the conditional probability score for the given assessor;
  
  train the machine learning algorithm using the digital training document and the consensus label distribution.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The training server of claim 9, the processor further configured to:
    - determine an expertise parameter for each of the plurality of assessors based on the pool of selected labels; and
      
      determine a difficulty parameter or the digital training document based on the pool of selected labels.
  - 11. The training server of claim 10, wherein:
    - the expertise parameter is independent of the digital training document assessed; and
      
      the difficulty parameter is independent of any assessor assessing the digital training document.
  - 12. The training server of claim 9, whereinthe conditional probability score is determined, for each of the range of possible labels, based at least on the expertise parameter and the difficulty parameter.
  - 13. The training server of claim 9, wherein the assessor-inherent probability score is determined based at least on a given assessor'"'"'s assessor-specific tendency parameter.
  - 14. The training server of claim 13, further comprising determining wherein the given assessor'"'"'s assessor-specific tendency parameter based at least on the given assessor'"'"'s assessing history.
  - 15. The training server of claim 9, wherein the machine learning algorithm is executed by a ranking application of a search ranker server, and wherein the training is based on a target of improving the accuracy of the machine learning algorithm.
  - 16. The training server of claim 15, wherein improving the accuracy represents improving a relevancy of a search result in response to a search request.

17. A computer implemented method of generating a training object for training a machine learning algorithm, the training object including a digital training document and an assigned label, the method executable at a training server, the method comprising:
- acquiring the digital training document to be used in the training;
  
  transmitting, via a communication network, the digital training document to a plurality of assessors, transmitting further including indicating a range of possible labels for the assessors to assess from the range of possible labels including at least a first possible label and a second possible label;
  
  obtaining from each of the plurality of assessors a selected label to form a pool of selected labels;
  
  determining an expertise parameter for each of the plurality of assessors based on the pool of selected labels, wherein;
  
  the expertise parameter is independent of the digital training document assessed;
  
  determining a difficulty parameter of the digital training document based on the pool of selected labels, wherein;
  
  the difficulty parameter is independent of any assessor assessing the digital training document;
  
  generating a consensus label distribution based on the pool of selected labels, the consensus label distribution representing a range of perceived labels for the digital training document and an associated probability score for each of the perceived labels, wherein the consensus label distribution is determined by;
  
  determining an assessor-specific perceived label distribution for each assessor of the plurality of assessors;
  
  aggregating each of the assessor-specific perceived label distribution of the plurality of assessors, wherein the assessor-specific perceived label distribution for a given assessor is determined by;
  
  for the first possible label;
  
  determining an assessor-inherent probability score of the first possible label being selected by the given assessor;
  
  determining a conditional probability score based at least on the expertise parameter and the difficulty parameter, the conditional probability score representing the probability of the selected label provided the given assessor perceived the first possible label as a most relevant label to the digital training document; and
  
  aggregating the assessor-inherent probability score and the conditional probability score to obtain a first label specific perceived score;
  
  for the second possible label;
  
  determining the assessor-inherent probability score of the second possible label being selected by the given assessor;
  
  determining the conditional probability score based at least on the expertise parameter and the difficulty parameter, the conditional probability score representing the probability of the selected label provided the given assessor perceived the second possible label as the most relevant label to the digital training document;
  
  aggregating the assessor-inherent probability score and the conditional probability score to obtain a second label specific perceived score;
  
  aggregating the first label specific perceived score and the second label specific perceived score;
  
  training the machine learning algorithm using the digital training document and the consensus label distribution.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
YE Hub Armenia LLC
Original Assignee
Yandex Europe AG (Yandex N.V.)
Inventors
Gusev, Gleb Gennadievich, Fedorova, Valentina Pavlovna, Mishchenko, Andrey Sergeevich
Primary Examiner(s)
Chbouki, Tarek

Application Number

US15/607,603
Publication Number

US 20170364810A1
Time in Patent Office

869 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/951   Indexing; Web crawling tech...

G06F 17/18   for evaluating statistical ...

G06N 20/00   Machine learning

G06N 7/01   Probabilistic graphical mod...

Method of generating a training object for training a machine learning algorithm

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Method of generating a training object for training a machine learning algorithm

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links