Systems and Methods for the Determining Annotator Performance in the Distributed Annotation of Source Data

US 20160275418A1
Filed: 05/27/2016
Published: 09/22/2016
Est. Priority Date: 06/22/2012
Status: Active Grant

First Claim

Patent Images

1. A method for clustering annotators via a distributed data annotation process, comprising:

obtaining a set of source data using a distributed data annotation server system, where a piece of source data in the set of source data comprises at least one identifying feature;

determining a training data set representative of the set of source data using the distributed data annotation server system, where each piece of source data in the training data set comprises source data metadata describing the ground truth for the piece of source data, where the ground truth for a piece of source data describes the features contained in the piece of source data and a correct label associated with each feature;

obtaining sets of annotations from a set of annotators for a portion of the training data set using the distributed data annotation server system, where an annotation identifies one or more features within a piece of source data in the training data set;

for each annotator;

determining annotator recall metadata based on the set of annotations provided by the annotator for the training data set using the distributed data annotation server system, where the annotator recall metadata comprises a measure of the number of features within a piece of source data identified with a label in the set of annotations by the annotator; and

determining annotator precision metadata based on the set of annotations provided by the annotator for the training data set using the distributed data annotation server system, where the annotator precision metadata comprises a measure of the number of correct annotations associated with each piece of source data based on the ground truth for each piece of source data; and

grouping the annotators into annotator groups based on the annotator recall metadata and the annotator precision metadata using the distributed data annotation server system.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for determining annotator performance in the distributed annotation of source data in accordance embodiments of the invention are disclosed. In one embodiment of the invention, a method for clustering annotators includes obtaining a set of source data, determining a training data set representative of the set of source data, obtaining sets of annotations from a set of annotators for a portion of the training data set, for each annotator determining annotator recall metadata based on the set of annotations provided by the annotator for the training data set and determining annotator precision metadata based on the set of annotations provided by the annotator for the training data set, and grouping the annotators into annotator groups based on the annotator recall metadata and the annotator precision metadata.

20 Citations

View as Search Results

20 Claims

1. A method for clustering annotators via a distributed data annotation process, comprising:
- obtaining a set of source data using a distributed data annotation server system, where a piece of source data in the set of source data comprises at least one identifying feature;
  
  determining a training data set representative of the set of source data using the distributed data annotation server system, where each piece of source data in the training data set comprises source data metadata describing the ground truth for the piece of source data, where the ground truth for a piece of source data describes the features contained in the piece of source data and a correct label associated with each feature;
  
  obtaining sets of annotations from a set of annotators for a portion of the training data set using the distributed data annotation server system, where an annotation identifies one or more features within a piece of source data in the training data set;
  
  for each annotator;
  
  determining annotator recall metadata based on the set of annotations provided by the annotator for the training data set using the distributed data annotation server system, where the annotator recall metadata comprises a measure of the number of features within a piece of source data identified with a label in the set of annotations by the annotator; and
  
  determining annotator precision metadata based on the set of annotations provided by the annotator for the training data set using the distributed data annotation server system, where the annotator precision metadata comprises a measure of the number of correct annotations associated with each piece of source data based on the ground truth for each piece of source data; and
  
  grouping the annotators into annotator groups based on the annotator recall metadata and the annotator precision metadata using the distributed data annotation server system.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, further comprising generating an annotation task comprising a portion of the set of source data using the distributed data annotation server system, where the annotation task configures an annotator to annotate one or more features within the set of source data.
  - 3. The method of claim 2, wherein the annotation tasks are targeted toward one or more annotator groups.
  - 4. The method of claim 1, further comprising measuring the time taken by an annotator to provide an annotation within the sets of annotations using the distributed data annotation server system.
  - 5. The method of claim 1, further comprising:
    - calculating a reward based on the annotator recall metadata and the annotator precision metadata using the distributed data annotation server system; and
      
      providing the reward to an annotator for providing one or more annotations using the distributed data annotation server system.
  - 6. The method of claim 5, wherein grouping annotators into annotator groups is further based on the calculated reward.
  - 7. The method of claim 1, wherein the obtained sets of annotations are clustered into annotation clusters based on the features within the piece of source data identified by the annotations using the distributed data annotation server system.
  - 8. The method of claim 7, wherein:
    - the set source data comprises image data; and
      
      the annotation clusters comprise annotations that are within a distance threshold from each other within the image data.
  - 9. The method of claim 7, wherein the annotation clusters comprise annotations that are within a distance threshold from the ground truth for the feature identified by the annotations.
  - 10. The method of claim 7, further comprising:
    - determining an error rate for each annotator based on the annotation clusters using the distributed data annotation server system; and
      
      grouping the annotators into annotator groups based on the determined error rate for the annotators using the distributed data annotation server system.

11. A distributed data annotation server system, comprising:
- a processor; and
  
  a memory configured to store a data annotation application;
  
  wherein the data annotation application configures the processor to;
  
  obtain a set of source data, where a piece of source data in the set of source data comprises at least one identifying feature;
  
  determine a training data set representative of the set of source data, where each piece of source data in the training data set comprises source data metadata describing the ground truth for the piece of source data, where the ground truth for a piece of source data describes the features contained in the piece of source data and a correct label associated with each feature;
  
  obtain sets of annotations from a set of annotators for a portion of the training data set, where an annotation identifies one or more features within a piece of source data in the training data set;
  
  for each annotator;
  
  determine annotator recall metadata based on the set of annotations provided by the annotator for the training data set, where the annotator recall metadata comprises a measure of the number of features within a piece of source data identified with a label in the set of annotations by the annotator; and
  
  determine annotator precision metadata based on the set of annotations provided by the annotator for the training data set, where the annotator precision metadata comprises a measure of the number of correct annotations associated with each piece of source data based on the ground truth for each piece of source data; and
  
  group the annotators into annotator groups based on the annotator recall metadata and the annotator precision metadata.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The system of claim 11, wherein the data annotation application further configures the processor to generate an annotation task comprising a portion of the set of source data, where the annotation task configures an annotator to annotate one or more features within the set of source data.
  - 13. The system of claim 12, wherein the annotation tasks are targeted toward one or more annotator groups.
  - 14. The system of claim 11, wherein the data annotation application further configures the processor to measure the time taken by an annotator to provide an annotation within the sets of annotations.
  - 15. The system of claim 11, wherein the data annotation application further configures the processor to:
    - calculate a reward based on the annotator recall metadata and the annotator precision metadata; and
      
      provide the reward to an annotator for providing one or more annotations.
  - 16. The system of claim 15, wherein the processor is further configured to group annotators into annotator groups based on the calculated reward.
  - 17. The system of claim 11, wherein the processor is configured to cluster the obtained sets of annotations into annotation clusters based on the features within the piece of source data identified by the annotations.
  - 18. The system of claim 17, wherein:
    - the set source data comprises image data; and
      
      the annotation clusters comprise annotations that are within a distance threshold from each other within the image data.
  - 19. The system of claim 17, wherein the annotation clusters comprise annotations that are within a distance threshold from the ground truth for the feature identified by the annotations.
  - 20. The system of claim 17, wherein the data annotation application further configures the processor to:
    - determine an error rate for each annotator based on the annotation clusters; and
      
      group the annotators into annotator groups based on the determined error rate for the annotators.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
California Institute of Technology
Original Assignee
California Institute of Technology
Inventors
Welinder, Peter, Perona, Pietro

Granted Patent

US 9,898,701 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/215   Improving data quality; Dat...

G06F 16/24573   using data annotations, e.g...

G06F 16/285   Clustering or classification

G06F 18/2185   the supervisor being an aut...

G06F 18/41   Interactive pattern learnin...

G06F 40/169   Annotation, e.g. comment da...

G06N 20/00   Machine learning

G06N 5/048   Fuzzy inferencing

Systems and Methods for the Determining Annotator Performance in the Distributed Annotation of Source Data

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

20 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and Methods for the Determining Annotator Performance in the Distributed Annotation of Source Data

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

20 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links