Differential privacy and outlier detection within a non-interactive model
First Claim
1. A system comprising:
- at least one processor; and
at least one memory storing instructions which, when executed by the at least one processor, cause operations comprising;
receiving a plurality of indices for a plurality of perturbed data points from an perturbed data set generated by one or more sensors, wherein the plurality of perturbed data points are anonymized versions of a plurality of unperturbed data points with the same plurality of indices, wherein receiving the plurality of indices indicates that the plurality of unperturbed data points are identified as presumed outliers, wherein the plurality of perturbed data points lie around a first center point, and wherein the plurality of unperturbed data points lie around a second center point;
classifying, based upon distance differences, a first portion of the presumed outliers as true positives, wherein the distance differences include, for each of the plurality of perturbed data points, a difference between a distance of the perturbed data point from the first center point and a distance of a corresponding unperturbed data point from the second center point;
classifying, based upon the distance differences, a second portion of the presumed outliers as false positives, wherein each presumed outlier is classified as a false positive when a corresponding distance difference for the presumed outlier is less than a threshold distance away from the first center point, and wherein each presumed outlier is classified as a true positive when a corresponding distance difference for the presumed outlier is greater than the threshold distance away from the first center point; and
providing, based on the classifying, a list of confirmed outliers.
1 Assignment
0 Petitions
Accused Products
Abstract
A system for differential privacy is provided. In some implementations, the system performs operations comprising receiving a plurality of indices for a plurality of perturbed data points, which are anonymized versions of a plurality of unperturbed data points, wherein the plurality of indices indicate that the plurality of unperturbed data points are identified as presumed outliers. The plurality of perturbed data points can lie around a first center point and the plurality of unperturbed data points can lie around a second center point. The operations can further comprise classifying a portion of the presumed outliers as true positives and another portion of the presumed outliers as false positives, based upon differences in distances to the respective first and second center points for the perturbed and corresponding (e.g., same index) unperturbed data points. Related systems, methods, and articles of manufacture are also described.
-
Citations
18 Claims
-
1. A system comprising:
-
at least one processor; and at least one memory storing instructions which, when executed by the at least one processor, cause operations comprising; receiving a plurality of indices for a plurality of perturbed data points from an perturbed data set generated by one or more sensors, wherein the plurality of perturbed data points are anonymized versions of a plurality of unperturbed data points with the same plurality of indices, wherein receiving the plurality of indices indicates that the plurality of unperturbed data points are identified as presumed outliers, wherein the plurality of perturbed data points lie around a first center point, and wherein the plurality of unperturbed data points lie around a second center point; classifying, based upon distance differences, a first portion of the presumed outliers as true positives, wherein the distance differences include, for each of the plurality of perturbed data points, a difference between a distance of the perturbed data point from the first center point and a distance of a corresponding unperturbed data point from the second center point; classifying, based upon the distance differences, a second portion of the presumed outliers as false positives, wherein each presumed outlier is classified as a false positive when a corresponding distance difference for the presumed outlier is less than a threshold distance away from the first center point, and wherein each presumed outlier is classified as a true positive when a corresponding distance difference for the presumed outlier is greater than the threshold distance away from the first center point; and providing, based on the classifying, a list of confirmed outliers. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for protecting data associated with one or more sensors, the method comprising:
-
receiving, at a processor, a plurality of indices for a plurality of perturbed data points from an perturbed data set generated by one or more sensors, wherein the plurality of perturbed data points are anonymized versions of a plurality of unperturbed data points with the same plurality of indices, wherein receiving the plurality of indices indicates that the plurality of unperturbed data points are identified as presumed outliers, wherein the plurality of perturbed data points lie around a first center point, and wherein the plurality of unperturbed data points lie around a second center point; classifying, by the processor and based upon distance differences, a first portion of the presumed outliers as true positives, wherein the distance differences include, for each of the plurality of perturbed data points, a difference between a distance of the perturbed data point from the first center point and a distance of a corresponding unperturbed data point from the second center point; classifying, by the processor and based upon the distance differences, a second portion of the presumed outliers as false positives, wherein each presumed outlier is classified as a false positive when a corresponding distance difference for the presumed outlier is less than a threshold distance away from the first center point, and wherein each presumed outlier is classified as a true positive when a corresponding distance difference for the presumed outlier is greater than the threshold distance away from the first center point; and providing, by the processor and based on the classifying, a list of confirmed outliers. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A non-transitory computer program product storing instructions which, when executed by at least one data processor, causes operations comprising:
-
receiving a plurality of indices for a plurality of perturbed data points from an perturbed data set generated by one or more sensors, wherein the plurality of perturbed data points are anonymized versions of a plurality of unperturbed data points with the same plurality of indices, wherein receiving the plurality of indices indicates that the plurality of unperturbed data points are identified as presumed outliers, wherein the plurality of perturbed data points lie around a first center point, and wherein the plurality of unperturbed data points lie around a second center point; and classifying, based upon distance differences, a first portion of the presumed outliers as true positives, wherein the distance differences include, for each of the plurality of perturbed data points, a difference between a distance of the perturbed data point from the first center point and a distance of a corresponding unperturbed data point from the second center point; and classifying, based upon the distance differences, a second portion of the presumed outliers as false positives, wherein each presumed outlier is classified as a false positive when a corresponding distance difference for the presumed outlier is less than the threshold distance away from the first center point, and wherein each presumed outlier is classified as a true positive when a corresponding distance difference for the presumed outlier is greater than the threshold distance away from the first center point; and providing, based on the classifying, a list of confirmed outliers. - View Dependent Claims (15, 16, 17, 18)
-
Specification