Identifying cohorts with anomalous confidential data submissions using matrix factorization and completion techniques
First Claim
Patent Images
1. A system comprising:
- one or more hardware processors;
a non-transitory computer-readable medium having instructions stored thereon, which, when executed by the one or more hardware processors, cause the system to;
for each value of a plurality of values of a first attribute of members of a social networking service who have submitted confidential data;
calculate, using the one or more hardware processors, an allowed range for confidential data values submitted by members having the value for the first attribute, across all values of a second attribute, the confidential data values having been entered on screens of graphical user interfaces and encrypted on an external data source;
calculate, using the one or more hardware processors, a median for the confidential data values by selecting a midpoint in a distribution of the confidential data values;
identify a set of candidate data transformation functions, each candidate data transformation function applying a different function on the confidential data values;
shift, using the one or more hardware processors, the allowed range based on an inferred median confidential data value relative to the median of the confidential data values, the inferred median confidential data value being inferred by optimizing parameters of an objective function that minimize an error function for the objective function to select one candidate data transformation functions from the set of candidate data transformation functions, applying the selected one candidate data transformation function to the confidential data values to transform the confidential data values, and calculating a median for the transformed confidential data values;
for each value of the plurality of values of the first attribute of members of the social networking service who have submitted confidential data and each value of a plurality of values of the second attribute of members of the social networking service who have submitted confidential data;
determine, using the one or more hardware processors, whether the submitted confidential data from members having the value of the first attribute and the value of the second attribute is outside the shifted allowed range for the value of the first attribute; and
in response to a determination that the submitted confidential data from members having the value of the first attribute and the value of the second attribute is outside the shifted allowed range for the value of the first attribute, mark, using the one or more hardware processors, a combination of the value of the first attribute and the value of the second attribute as anomalous.
2 Assignments
0 Petitions
Accused Products
Abstract
In an example, for each value of a plurality of values of a first attribute of members of a social networking service who have submitted confidential data, an allowed range for normalized confidential data values submitted by members having the value for the first attribute, across all values of a second attribute, is calculated, and then shifted based on an inferred median confidential data value relative to a median of confidential data values. Then, anomalous confidential data values can be detected using this information.
12 Citations
20 Claims
-
1. A system comprising:
-
one or more hardware processors; a non-transitory computer-readable medium having instructions stored thereon, which, when executed by the one or more hardware processors, cause the system to; for each value of a plurality of values of a first attribute of members of a social networking service who have submitted confidential data; calculate, using the one or more hardware processors, an allowed range for confidential data values submitted by members having the value for the first attribute, across all values of a second attribute, the confidential data values having been entered on screens of graphical user interfaces and encrypted on an external data source; calculate, using the one or more hardware processors, a median for the confidential data values by selecting a midpoint in a distribution of the confidential data values; identify a set of candidate data transformation functions, each candidate data transformation function applying a different function on the confidential data values; shift, using the one or more hardware processors, the allowed range based on an inferred median confidential data value relative to the median of the confidential data values, the inferred median confidential data value being inferred by optimizing parameters of an objective function that minimize an error function for the objective function to select one candidate data transformation functions from the set of candidate data transformation functions, applying the selected one candidate data transformation function to the confidential data values to transform the confidential data values, and calculating a median for the transformed confidential data values; for each value of the plurality of values of the first attribute of members of the social networking service who have submitted confidential data and each value of a plurality of values of the second attribute of members of the social networking service who have submitted confidential data; determine, using the one or more hardware processors, whether the submitted confidential data from members having the value of the first attribute and the value of the second attribute is outside the shifted allowed range for the value of the first attribute; and in response to a determination that the submitted confidential data from members having the value of the first attribute and the value of the second attribute is outside the shifted allowed range for the value of the first attribute, mark, using the one or more hardware processors, a combination of the value of the first attribute and the value of the second attribute as anomalous. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method comprising:
-
for each value of a plurality of values of a first attribute of members of a social networking service who have submitted confidential data; calculating, using one or more hardware processors, an allowed range for confidential data values submitted by members having the value for the first attribute, across all values of a second attribute, the confidential data values having been entered on screens of graphical user interfaces and encrypted on an external data source; calculating, using the one or more hardware processors, a median for the confidential data values by selecting a midpoint in a distribution of the confidential data values; identifying a set of candidate data transformation functions, each candidate data transformation function applying a different function on the confidential data values; shifting, using the one or more hardware processors, the allowed range based on an inferred median confidential data value relative to the median of the confidential data values, the inferred median confidential data value being inferred by optimizing parameters of an objective function that minimize an error function for the objective function to select one candidate data transformation functions from the set of candidate data transformation functions, applying the selected one candidate data transformation function to the confidential data values to transform the confidential data values, and calculating a median for the transformed confidential data values; for each value of the plurality of values of the first attribute of members of the social networking service who have submitted confidential data and each value of a plurality of values of the second attribute of members of the social networking service who have submitted confidential data; determining, using the one or more hardware processors, whether the submitted confidential data from members having the value of the first attribute and the value of the second attribute is outside the shifted allowed range for the value of the first attribute; and in response to a determination that the submitted confidential data from members having the value of the first attribute and the value of the second attribute is outside the shifted allowed range for the value of the first attribute, marking, using the one or more hardware processors, a combination of the value of the first attribute and the value of the second attribute as anomalous. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory machine-readable storage medium comprising instructions, which when implemented by one or more machines, cause the one or more machines to perform operations comprising:
-
for each value of a plurality of values of a first attribute of members of a social networking service who have submitted confidential data; calculating, using one or more hardware processors, an allowed range for confidential data values submitted by members having the value for the first attribute, across all values of a second attribute, the confidential data values having been entered on screens of graphical user interfaces and encrypted on an external data source; calculating, using the one or more hardware processors, a median for the confidential data values by selecting a midpoint in a distribution of the confidential data values; identifying a set of candidate data transformation functions, each candidate data transformation function applying a different function on the confidential data values; shifting, using the one or more hardware processors, the allowed range based on an inferred median confidential data value relative to the median of the confidential data values, the inferred median confidential data value being inferred by optimizing parameters of an objective function that minimize an error function for the objective function to select one candidate data transformation functions from the set of candidate data transformation functions, applying the selected one candidate data transformation function to the confidential data values to transform the confidential data values, and calculating a median for the transformed confidential data values; for each value of the plurality of values of the first attribute of members of the social networking service who have submitted confidential data and each value of a plurality of values of the second attribute of members of the social networking service who have submitted confidential data; determining, using the one or more hardware processors, whether the submitted confidential data from members having the value of the first attribute and the value of the second attribute is outside the shifted allowed range for the value of the first attribute; and in response to a determination that the submitted confidential data from members having the value of the first attribute and the value of the second attribute is outside the shifted allowed range for the value of the first attribute, marking, using the one or more hardware processors, a combination of the value of the first attribute and the value of the second attribute as anomalous. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification