Systems and Methods for the Determining Annotator Performance in the Distributed Annotation of Source Data
First Claim
1. A method for clustering annotators via a distributed data annotation process, comprising:
- obtaining a set of source data using a distributed data annotation server system, where a piece of source data in the set of source data comprises at least one identifying feature;
determining a training data set representative of the set of source data using the distributed data annotation server system, where each piece of source data in the training data set comprises source data metadata describing the ground truth for the piece of source data, where the ground truth for a piece of source data describes the features contained in the piece of source data and a correct label associated with each feature;
obtaining sets of annotations from a set of annotators for a portion of the training data set using the distributed data annotation server system, where an annotation identifies one or more features within a piece of source data in the training data set;
for each annotator;
determining annotator recall metadata based on the set of annotations provided by the annotator for the training data set using the distributed data annotation server system, where the annotator recall metadata comprises a measure of the number of features within a piece of source data identified with a label in the set of annotations by the annotator; and
determining annotator precision metadata based on the set of annotations provided by the annotator for the training data set using the distributed data annotation server system, where the annotator precision metadata comprises a measure of the number of correct annotations associated with each piece of source data based on the ground truth for each piece of source data; and
grouping the annotators into annotator groups based on the annotator recall metadata and the annotator precision metadata using the distributed data annotation server system.
4 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for determining annotator performance in the distributed annotation of source data in accordance embodiments of the invention are disclosed. In one embodiment of the invention, a method for clustering annotators includes obtaining a set of source data, determining a training data set representative of the set of source data, obtaining sets of annotations from a set of annotators for a portion of the training data set, for each annotator determining annotator recall metadata based on the set of annotations provided by the annotator for the training data set and determining annotator precision metadata based on the set of annotations provided by the annotator for the training data set, and grouping the annotators into annotator groups based on the annotator recall metadata and the annotator precision metadata.
20 Citations
20 Claims
-
1. A method for clustering annotators via a distributed data annotation process, comprising:
-
obtaining a set of source data using a distributed data annotation server system, where a piece of source data in the set of source data comprises at least one identifying feature; determining a training data set representative of the set of source data using the distributed data annotation server system, where each piece of source data in the training data set comprises source data metadata describing the ground truth for the piece of source data, where the ground truth for a piece of source data describes the features contained in the piece of source data and a correct label associated with each feature; obtaining sets of annotations from a set of annotators for a portion of the training data set using the distributed data annotation server system, where an annotation identifies one or more features within a piece of source data in the training data set; for each annotator; determining annotator recall metadata based on the set of annotations provided by the annotator for the training data set using the distributed data annotation server system, where the annotator recall metadata comprises a measure of the number of features within a piece of source data identified with a label in the set of annotations by the annotator; and determining annotator precision metadata based on the set of annotations provided by the annotator for the training data set using the distributed data annotation server system, where the annotator precision metadata comprises a measure of the number of correct annotations associated with each piece of source data based on the ground truth for each piece of source data; and grouping the annotators into annotator groups based on the annotator recall metadata and the annotator precision metadata using the distributed data annotation server system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A distributed data annotation server system, comprising:
-
a processor; and a memory configured to store a data annotation application; wherein the data annotation application configures the processor to; obtain a set of source data, where a piece of source data in the set of source data comprises at least one identifying feature; determine a training data set representative of the set of source data, where each piece of source data in the training data set comprises source data metadata describing the ground truth for the piece of source data, where the ground truth for a piece of source data describes the features contained in the piece of source data and a correct label associated with each feature; obtain sets of annotations from a set of annotators for a portion of the training data set, where an annotation identifies one or more features within a piece of source data in the training data set; for each annotator; determine annotator recall metadata based on the set of annotations provided by the annotator for the training data set, where the annotator recall metadata comprises a measure of the number of features within a piece of source data identified with a label in the set of annotations by the annotator; and determine annotator precision metadata based on the set of annotations provided by the annotator for the training data set, where the annotator precision metadata comprises a measure of the number of correct annotations associated with each piece of source data based on the ground truth for each piece of source data; and group the annotators into annotator groups based on the annotator recall metadata and the annotator precision metadata. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification