METHOD FOR INFERRING ATTRIBUTES OF A DATA SET AND RECOGNIZERS USED THEREON
First Claim
1. A method of executing a computer program using a processor of a user terminal to infer attributes of a data set or a plurality of recognizers configured to label the data set, the method comprising the steps of:
- receiving a labeled data set having tallies of each of a plurality of label voting patterns;
constructing an inference equation for each of the plurality of label voting patterns in terms of statistical parameters and the tallies, wherein the statistical parameters indicate a probability of an observable event in the labeled data set;
calculating values for the statistical parameters based on the inference equation for each of the plurality of label voting patterns; and
calculating the attributes of the data set or the plurality of recognizers based on the values of the statistical parameters.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for inferring, without supervision, information about a data set and/or recognizers that are operated thereon. The recognizers are modules that are capable of analyzing, interpreting and labeling raw data of the data set with a label, which is a cognitive or substance-based identifier of the data, for instance, identifying peaks, troughs, patterns and trends of particular significance. The method infers the information about the data set and/or the recognizers based on the observable outputs of each recognizer and a mathematical means of reconciling the agreement/disagreement of the outputs. The method operates without need for knowledge of the correct label to be applied to the data set by each of the recognizers, such as a test set or prior knowledge of the accuracy of the recognizer.
-
Citations
37 Claims
-
1. A method of executing a computer program using a processor of a user terminal to infer attributes of a data set or a plurality of recognizers configured to label the data set, the method comprising the steps of:
-
receiving a labeled data set having tallies of each of a plurality of label voting patterns; constructing an inference equation for each of the plurality of label voting patterns in terms of statistical parameters and the tallies, wherein the statistical parameters indicate a probability of an observable event in the labeled data set; calculating values for the statistical parameters based on the inference equation for each of the plurality of label voting patterns; and calculating the attributes of the data set or the plurality of recognizers based on the values of the statistical parameters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
receiving all the solutions of the optimization equations rather than a single solution for the statistical model selection of a mode of operation.
-
-
10. The method of claim 1, further comprising:
-
determining whether each possible label voting pattern is observable in the labeled data set; and
,if not, gathering or prompting for an additional data set or a selection of a change to a model; and receiving one or more of the additional data set or the selection of the change of the model.
-
-
11. The method of claim 10, further comprising:
-
separately labeling, using the plurality of recognizers, the additional data set; and aligning the additional data set to the labeled data set; inserting at least one Null label into the additional data set or the labeled data set where the plurality of recognizers have deletion or insertion errors; and counting each instance of each of the plurality of label voting patterns to produce the tallies.
-
-
12. The method of claim 1,
wherein the observable event indicated by the statistical parameter is an instance selected from the group consisting of: - one of the recognizers applying a label to the labeled data set when the true or correct label to be applied to the labeled data set is one of the labels, and two or more of the recognizers both applying a same label to the labeled data set with each other when the true or correct label to be applied to the labeled data set is one of the labels.
-
13. The method of claim 1, wherein the step of constructing the inference equation further comprises:
substituting an expression based on the probability of the applied, non-Null labels for an expression of a probability of the Null-label in each of the inference equations.
-
14. The method of claim 1, wherein the step of constructing the inference equation further comprises:
substituting an expression based on the statistical parameters and the probability of the labels for an expression of a probability of non-Null labels in each of the inference equations.
-
15. The method of claim 1, wherein the attributes are selected from the group consisting of:
- an actual prevalence of each label including the Null label, an inferred prevalence of each label including the Null label, an inferred prevalence of the all-Null label voting pattern, a confidence measure of each label applied by each of the plurality of recognizers, an inferred length of the data set, a substitution error rate of each recognizer, an insertion error rate of each recognize and a deletion error rate of each recognizer.
-
16. The method of claim 15, further comprising:
outputting the attributes that are calculated.
-
17. A method of executing a computer program using a processor of a user terminal to estimate a minimum number of recognizers required to infer attributes of a data set or the recognizers configured to label the data set, the method comprising the steps of:
-
receiving a number of labels that can be applied to the data set by the recognizers, wherein the number of labels includes a Null label when applicable; receiving a number of recognizers that are to be correlated in groupings of the recognizers in order to infer the attributes of the data set or the recognizers, wherein the correlation of the groupings of the recognizers relates to a conditional probability of the recognizers in the grouping of recognizers agreeing with each other about the label to apply to a data point of the data set; and determining the minimum number of recognizers based on the number of labels that can be applied to the data set by the recognizers and the number of recognizers that are to be correlated in the grouping of the recognizers. - View Dependent Claims (18, 19, 20)
-
-
21. A method of executing a computer program using a processor of a user terminal to compensate for a probability of unobservable events in a labeled data set, the method comprising the steps of:
-
receiving a labeled data set having at least one instance of each label voting pattern except an all-Null label voting pattern, wherein the labeled data set was labeled by at least four recognizers; constructing a probabilistic representation of the labeled data set; projecting out a portion of the labeled data set associated with one or more of the at least four recognizers from the labeled data set to produce a reduced data set, wherein the reduced data set includes at least one instance of an all-Null label voting pattern of the reduced data set that is observable relative to at least one associated non-Null label voting pattern of the projected out portion of the labeled data set; calculating a probability of the all-Null label voting pattern of the labeled data set based on the at least one instance of the all-Null label voting pattern of the reduced data set; and modifying the probabilistic representation of the labeled data set to compensate for the probability of the all-Null label voting pattern of the labeled data set. - View Dependent Claims (22, 23)
-
-
24. A method of executing a computer program using a processor of a user terminal to compensate for a probability of unobservable events in a labeled data set, the method comprising the steps of:
-
receiving a labeled data set having at least one instance of each label voting pattern except an all-Null label voting pattern, wherein the labeled data set was labeled by at least three recognizers; constructing a probabilistic representation of the labeled data set, wherein the probabilistic representation comprises a plurality of equations; calculating a correction factor by summing the plurality of equations of the probabilistic representation; and constructing a corrected probabilistic representation of the labeled data set based on the probabilistic representation and the correction factor. - View Dependent Claims (25, 26)
-
-
27. A method of executing a computer program using a processor of a user terminal to infer attributes of a data set or a plurality of recognizers configured to label the data set, the method comprising the steps of:
-
a) receiving a labeled data set, said labeled data set associated with a plurality of recognizers; b) selecting a data set assigned to a first recognizer (said first recognizer data set) c) assigning an abstract symbol α
to said first recognizer data;d) selecting a data set assigned to a second recognizer (said second recognizer data set) e) comparing said second recognizer data set to said first recognizer data set; (1) assigning an abstract symbol α
label to said second recognizer data if said second data set matches said first data set;(2) assigning an abstract symbol β
label to said second recognizer data if said second data set does not match said first data set;f) selecting a data set assigned to a third recognizer (said third recognizer data set) g) comparing said third recognizer data set to said first recognizer data set and to said second recognizer data set; (1) assigning an abstract symbol α
label to said third recognizer data if said third data set matches said first data set;(2) assigning an abstract symbol β
label to said third recognizer data if said third data set matches said second data set;(3) assigning an abstract symbol γ
label to said third recognizer data if said third data set does not match either of said first data set or said second data;h) comparing a correct output to said first, second and third recognizer data set; i) assigning one of each of the first three abstract symbol {α
,β
,γ
} labels if the correct output equals one already outputted by any of the recognizers and assigning an abstract symbol δ
, if the correct output is not present in the output of any of the three recognizers, specifically,(1) assigning an abstract symbol α
label if said correct output matches said first data set;(2) assigning an abstract symbol β
label if said correct output matches said second data set; and(3) assigning an abstract symbol γ
label if said correct output matches said third recognizer data set;i) calculating the minimum number of independent recognizers from the values of the abstract symbols {α
,β
,γ
} for the statistical parameters based on the inference equation for each of the plurality of recognizers; andinferring attributes of the data set separately for said first, second and third recognizers based on the corrected probabilistic representation of the values of the abstract symbol {α
,β
,γ
} labeled data set for the attributes of the data set or the plurality of recognizers based on the values of a conditional recognition probabilities for values of the abstract symbols {α
,β
,γ
}. - View Dependent Claims (28, 29, 30, 31)
-
-
32. A method of executing a computer program using a processor of a user terminal to infer attributes of a data set or a plurality of recognizers configured to label the data set, the method comprising the steps of:
-
receiving a selection of a model, wherein the model includes a specification selected from the group consisting of;
using n-recognizer correlation factors to measure and account for dependencies between the recognizers, using a selected approach to compensate for the effect of unobservable events on the statistical model; andusing a particular algebraic optimization equation to solve the statistical model to receive all the solutions of the optimization equations rather than a single solution for the statistical model selection of a mode of operation.
-
-
33. A method of executing a computer program using a processor of a user terminal to infer attributes of a data set or a plurality of recognizers specific to a field or context obtaining sequential statistics for sequential data when performing the labeling, the method comprising the steps of:
-
a) determining a prevalence of labels on a location basis, b) deriving a statistical model for the locations to infer the prevalence of correct labels by the steps of; 1) determining patterns for the location using an inference equation; 2) inferring the qualities of the labeling for each recognizer for each location; 3) forming a conditional recognition probability table for each location; and 4) determining how often labels follow each other from conditional recognition probability table for the inferring the prevalence of a correct labels, p(l) c) determining the prevalence of the correct labels, p(l) from the gaps of correct labels between the locations from conditional recognition probability table. - View Dependent Claims (34, 35, 36, 37)
-
Specification