Method for inferring attributes of a data set and recognizers used thereon
First Claim
1. A method of executing a computer program using a processor of a user terminal to infer attributes of a data set or attributes of a plurality of recognizers configured to label the data set, the method comprising the steps of:
- receiving, by the processor, the data set as labeled data set having tallies of each of a plurality of label voting patterns, each of the label voting patterns representing a combination of labels, each of the labels within the combination resulting from analysis of the data set by a different recognizer of the plurality, each of the tallies representing the number of times that a particular label voting pattern resulted from analysis of the data set by the plurality of recognizers;
constructing, by the processor, an inference equation for each of the plurality of label voting patterns in terms of statistical parameters and the tallies, wherein the statistical parameters indicate a probability of an observable event in the labeled data set;
calculating, by the processor, values for the statistical parameters based on the inference equation for each of the plurality of label voting patterns; and
calculating, by the processor, the attributes of the data set or the attributes of the plurality of recognizers based on the values of the statistical parameters.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for inferring, without supervision, information about a data set and/or recognizers that are operated thereon. The recognizers are modules that are capable of analyzing, interpreting and labeling raw data of the data set with a label, which is a cognitive or substance-based identifier of the data, for instance, identifying peaks, troughs, patterns and trends of particular significance. The method infers the information about the data set and/or the recognizers based on the observable outputs of each recognizer and a mathematical means of reconciling the agreement/disagreement of the outputs. The method operates without need for knowledge of the correct label to be applied to the data set by each of the recognizers, such as a test set or prior knowledge of the accuracy of the recognizer.
-
Citations
17 Claims
-
1. A method of executing a computer program using a processor of a user terminal to infer attributes of a data set or attributes of a plurality of recognizers configured to label the data set, the method comprising the steps of:
-
receiving, by the processor, the data set as labeled data set having tallies of each of a plurality of label voting patterns, each of the label voting patterns representing a combination of labels, each of the labels within the combination resulting from analysis of the data set by a different recognizer of the plurality, each of the tallies representing the number of times that a particular label voting pattern resulted from analysis of the data set by the plurality of recognizers; constructing, by the processor, an inference equation for each of the plurality of label voting patterns in terms of statistical parameters and the tallies, wherein the statistical parameters indicate a probability of an observable event in the labeled data set; calculating, by the processor, values for the statistical parameters based on the inference equation for each of the plurality of label voting patterns; and calculating, by the processor, the attributes of the data set or the attributes of the plurality of recognizers based on the values of the statistical parameters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A method of executing a computer program using a processor of a user terminal to infer attributes of a data set and attributes of a plurality of recognizers configured to label the data set, the method comprising the steps of:
-
receiving, by the processor, the data set as labeled data set having tallies of a plurality of label voting patterns, each of the label voting patterns representing a combination of labels, each of the labels within the combination resulting from analysis of the data set by a different recognizer of the plurality, each of the tallies representing the number of times that a particular label voting pattern resulted from analysis of the data set by the plurality of recognizers; constructing, by the processor, an inference equation for each of the plurality of label voting patterns in terms of statistical parameters and the tallies, wherein the statistical parameters indicate a probability of an observable event in the labeled data set; calculating, by the processor, values for the statistical parameters based on the inference equation for each of the plurality of label voting patterns; calculating, by the processor, the attributes of the data set and the attributes of the plurality of recognizers based on the values of the statistical parameters; and wherein the attributes of the data set include at least one of;
a prevalence of each label;
an inferred prevalence of each label, an inferred prevalence of an all-Null label voting pattern, a confidence measurement of each label applied by each of the plurality of recognizers;
an inferred length of the data set; andthe attributes of the plurality of recognizers include at least one of;
a substitution error rate of each recognizer;
an insertion error rate of each recognizer; and
a deletion error rate of each recognizer.
-
Specification