Computerized matrix factorization and completion to infer median/mean confidential values
First Claim
1. A system comprising:
- one or more hardware processors;
a computer-readable medium having instructions stored thereon, which, when executed by a processor, cause the system to;
obtain, using the one or more hardware processors, an anonymized set of confidential data values for a plurality of combinations of cohorts having a first attribute type and a second attribute type, the confidential data values received via a computerized user interface implemented as a screen of a graphical user interface, the confidential data values entered into a field of the screen of the graphical user interface;
construct, using the one or more hardware processors, a matrix of the confidential data values having the first attribute type as a first axis and the second attribute type as a second axis, with each cell in the matrix corresponding to corresponding different combinations of attributes of the first attribute type and the second attribute type;
compute, using the one or more hardware processors, a set of candidate low rank approximations of the matrix using an objective function evaluated using a set of candidate data transformation functions, the objective function having one or more parameters and an error function, wherein computing a set of candidate low rank approximations includes, for each candidate data transformation function from the set;
applying, using the one or more hardware processors, the candidate data transformation function to the matrix;
obtaining, using the one or more hardware processors, a training matrix by hiding a preset fraction of entries of the transformed matrix;
for each of one or more candidate parameter values for one of the one or more parameters;
computing, using the one or more hardware processors, the objective function using the candidate parameter value; and
calculating, using the one or more hardware processors, the error function using the candidate parameter value;
optimize, using the one or more hardware processors, the one or more parameters that minimizes the error function of the objective function to select one of the candidate low rank approximations of the matrix; and
infer, using the one or more hardware processors, one or more cells that are missing data, of the selected one of the candidate low rank approximations of the matrix.
2 Assignments
0 Petitions
Accused Products
Abstract
In an example embodiment, an anonymized set of confidential data values is obtained for a plurality of combinations of cohorts having a first attribute type and a second attribute type. A matrix of the confidential data values having the first attribute type as a first axis and the second attribute type as the second axis is constructed. A set of candidate low rank approximations of the matrix is calculated using an objective function evaluated using a set of candidate data transformation functions, the objective function having one or more parameters and an error function. One or more parameters that minimize the error function of the objective function are minimized to select one of the candidate low rank approximations of the matrix. Then one or more cells that are missing data, of the selected one of the candidate low rank approximations of the matrix, are inferred.
-
Citations
17 Claims
-
1. A system comprising:
-
one or more hardware processors; a computer-readable medium having instructions stored thereon, which, when executed by a processor, cause the system to; obtain, using the one or more hardware processors, an anonymized set of confidential data values for a plurality of combinations of cohorts having a first attribute type and a second attribute type, the confidential data values received via a computerized user interface implemented as a screen of a graphical user interface, the confidential data values entered into a field of the screen of the graphical user interface; construct, using the one or more hardware processors, a matrix of the confidential data values having the first attribute type as a first axis and the second attribute type as a second axis, with each cell in the matrix corresponding to corresponding different combinations of attributes of the first attribute type and the second attribute type; compute, using the one or more hardware processors, a set of candidate low rank approximations of the matrix using an objective function evaluated using a set of candidate data transformation functions, the objective function having one or more parameters and an error function, wherein computing a set of candidate low rank approximations includes, for each candidate data transformation function from the set; applying, using the one or more hardware processors, the candidate data transformation function to the matrix; obtaining, using the one or more hardware processors, a training matrix by hiding a preset fraction of entries of the transformed matrix; for each of one or more candidate parameter values for one of the one or more parameters; computing, using the one or more hardware processors, the objective function using the candidate parameter value; and calculating, using the one or more hardware processors, the error function using the candidate parameter value; optimize, using the one or more hardware processors, the one or more parameters that minimizes the error function of the objective function to select one of the candidate low rank approximations of the matrix; and infer, using the one or more hardware processors, one or more cells that are missing data, of the selected one of the candidate low rank approximations of the matrix. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computerized method comprising:
-
obtaining, using a hardware processor, an anonymized set of confidential data values for a plurality of combinations of cohorts having a first attribute type and a second attribute type, the confidential data values received via a computerized user interface implemented as a screen of a graphical user interface, the confidential data values entered into a field of the screen of the graphical user interface; constructing, using the hardware processor, a matrix of the confidential data values having the first attribute type as a first axis and the second attribute type as a second axis, with each cell in the matrix corresponding to corresponding different combinations of attributes of the first attribute type and the second attribute type; computing, using the hardware processor, a set of candidate low rank approximations of the matrix using an objective function evaluated using a set of candidate data transformation functions, the objective function having one or more parameters and an error function, wherein computing a set of candidate low rank approximations includes, for each candidate data transformation function; applying the candidate data transformation function to the matrix; obtaining a training matrix by hiding a preset fraction of entries of the transformed matrix; for each of one or more candidate parameter values for one of the one or more parameters; computing the objective function using the candidate parameter value; and calculating the error function using the candidate parameter value; optimizing the one or more parameters that minimizes the error function of the objective function to select one of the candidate low rank approximations of the matrix; and inferring one or more cells that are missing data, of the selected one of the candidate low rank approximations of the matrix. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A non-transitory machine-readable storage medium comprising instructions, which when implemented by one or more machines, cause the one or more machines to perform operations comprising:
-
obtaining, using a hardware processor, an anonymized set of confidential data values for a plurality of combinations of cohorts having a first attribute type and a second attribute type, the confidential data values received via a computerized user interface implemented as a screen of a graphical user interface, the confidential data values entered into a field of the screen of the graphical user interface; constructing, using the hardware processor, a matrix of the confidential data values having the first attribute type as a first axis and the second attribute type as a second axis, with each cell in the matrix corresponding to corresponding different combinations of attributes of the first attribute type and the second attribute type; computing, using the hardware processor, a set of candidate low rank approximations of the matrix using an objective function evaluated using a set of candidate data transformation functions, the objective function having one or more parameters and an error function, wherein computing a set of candidate low rank approximations includes, for each candidate data transformation function; applying the candidate data transformation function to the matrix; obtaining a training matrix by hiding a preset fraction of entries of the transformed matrix; for each of one or more candidate parameter values for one of the one or more parameters; computing the objective function using the candidate parameter value; and calculating the error function using the candidate parameter value; optimizing the one or more parameters that minimizes the error function of the objective function to select one of the candidate low rank approximations of the matrix; and inferring one or more cells that are missing data, of the selected one of the candidate low rank approximations of the matrix. - View Dependent Claims (14, 15, 16, 17)
-
Specification