Internal dataset-based outlier detection for confidential data in a computer system
First Claim
1. A system comprising:
- one or more hardware processors;
a non-transitory computer-readable medium having instructions stored there on, which, when executed by the one or more hardware processor, cause the system to;
receive, via a first computerized user interface implemented as a screen of a graphical user interface, a submission of a confidential data value of a first confidential data type from a first user, entered into a field of the screen of the graphical user interface;
identify, using the one or more hardware processors, one or more attributes of the first user;
retrieve a plurality of previously submitted confidential data values of a first confidential data type for a cohort matching the one or more attributes of the first user, the previously submitted confidential data values having been encrypted on an external data source, the cohort being a grouping of data pertaining to a combination of user attributes for users who submitted the confidential data values;
calculate, using the one or more hardware processors, a plurality of percentiles for the plurality of previously submitted confidential data values;
calculate, using the one or more hardware processors, an interquartile range for a first and a second of the plurality of percentiles, wherein the value for the first of the plurality of percentiles is lower than the value for the second of the plurality of percentiles;
compute, using the one or more hardware processors, a lower limit for the first confidential data type and the cohort by taking a maximum of zero or a difference between the value for the first of the plurality of percentiles and a product of a preset alpha parameter and the interquartile range;
determine, using the one or more hardware processors, whether the confidential data value submitted by the first user is an outlier by determining if the confidential data value submitted by the user is lower than the lower limit; and
in response to a determination that the confidential data value submitted by the first user is not outlier, permitting, using the one or more hardware processors, the confidential data value submitted by the first user to be used for insights provided to other users.
3 Assignments
0 Petitions
Accused Products
Abstract
In an example, a submission of a confidential data value of a first confidential data type is received from a first user with one or more attributes. A plurality of previously submitted confidential data values of a first confidential data type for a cohort matching the one or more attributes of the first user are retrieved. A plurality of percentiles for the confidential data values are calculated. Then, an interquartile range is calculated for a first and a second of the plurality of percentiles. A lower limit for the first confidential data type and the cohort is computed by taking a maximum of zero or the difference between the value for the first of the plurality of percentiles and a product of a preset alpha parameter and the interquartile range. Then it is determined if the confidential data value submitted by the user is lower than the lower limit.
-
Citations
20 Claims
-
1. A system comprising:
-
one or more hardware processors; a non-transitory computer-readable medium having instructions stored there on, which, when executed by the one or more hardware processor, cause the system to; receive, via a first computerized user interface implemented as a screen of a graphical user interface, a submission of a confidential data value of a first confidential data type from a first user, entered into a field of the screen of the graphical user interface; identify, using the one or more hardware processors, one or more attributes of the first user; retrieve a plurality of previously submitted confidential data values of a first confidential data type for a cohort matching the one or more attributes of the first user, the previously submitted confidential data values having been encrypted on an external data source, the cohort being a grouping of data pertaining to a combination of user attributes for users who submitted the confidential data values; calculate, using the one or more hardware processors, a plurality of percentiles for the plurality of previously submitted confidential data values; calculate, using the one or more hardware processors, an interquartile range for a first and a second of the plurality of percentiles, wherein the value for the first of the plurality of percentiles is lower than the value for the second of the plurality of percentiles; compute, using the one or more hardware processors, a lower limit for the first confidential data type and the cohort by taking a maximum of zero or a difference between the value for the first of the plurality of percentiles and a product of a preset alpha parameter and the interquartile range; determine, using the one or more hardware processors, whether the confidential data value submitted by the first user is an outlier by determining if the confidential data value submitted by the user is lower than the lower limit; and in response to a determination that the confidential data value submitted by the first user is not outlier, permitting, using the one or more hardware processors, the confidential data value submitted by the first user to be used for insights provided to other users. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computerized method comprising:
-
receiving, via a first computerized user interface implemented as a screen of a graphical user interface, a submission of a confidential data value of a first confidential data type from a first user, entered into a field of the screen of the graphical user interface; identifying, using one or more hardware processors, one or more attributes of the first user; retrieving, using the one or more hardware processors, a plurality of previously submitted confidential data values of a first confidential data type for a cohort matching the one or more attributes of the first user, the previously submitted confidential data values having been encrypted on an external data source, the cohort being a grouping of data pertaining to a combination of user attributes for users who submitted the confidential data values; calculating, using the one or more hardware processors, a plurality of percentiles for the plurality of previously submitted confidential data values; calculating, using the one or more hardware processors, an interquartile range for a first and a second of the plurality of percentiles, wherein the value for the first of the plurality of percentiles is lower than the value for the second of the plurality of percentiles; computing, using the one or more hardware processors, a lower limit for the first confidential data type and the cohort by taking a maximum of zero or a difference between the value for the first of the plurality of percentiles and a product of a preset alpha parameter and the interquartile range; determining, using the one or more hardware processors, whether the confidential data value submitted by the first user is an outlier by determining if the confidential data value submitted by the user is lower than the lower limit; and in response to a determination that the confidential data value submitted by the first user is not outlier, permitting, using the one or more hardware processors, the confidential data value submitted by the first user to be used for insights provided to other users. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory machine-readable storage medium comprising instructions, which when implemented by one or more machines, cause the one or more machines to perform operations comprising:
-
receiving, via a first computerized user interface implemented as a screen of a graphical user interface, a submission of a confidential data value of a first confidential data type from a first user, entered into a field of the screen of the graphical user interface; identifying, using one or more hardware processors, one or more attributes of the first user; retrieving, using the one or more hardware processors, a plurality of previously submitted confidential data values of a first confidential data type for a cohort matching the one or more attributes of the first user, the previously submitted confidential data values having been encrypted on an external data source, the cohort being a grouping of data pertaining to a combination of user attributes for users who submitted the confidential data values; calculating, using the one or more hardware processors, a plurality of percentiles for the plurality of previously submitted confidential data values; calculating, using the one or more hardware processors, an interquartile range for a first and a second of the plurality of percentiles, wherein the value for the first of the plurality of percentiles is lower than the value for the second of the plurality of percentiles; computing, using the one or more hardware processors, a lower limit for the first confidential data type and the cohort by taking a maximum of zero or a difference between the value for the first of the plurality of percentiles and a product of a preset alpha parameter and the interquartile range; determining, using the one or more hardware processors, whether the confidential data value submitted by the first user is an outlier by determining if the confidential data value submitted by the user is lower than the lower limit; and in response to a determination that the confidential data value submitted by the first user is not outlier, permitting, using the one or more hardware processors, the confidential data value submitted by the first user to be used for insights provided to other users. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification