Restricting sensitive query results in information management platforms
First Claim
1. A method for controlling access to a dataset, the method comprising:
- receiving, by one or more computer hardware processors, a request for a first dataset from a user, wherein the first dataset includes one or more first dataset values;
creating, by one or more computer hardware processors, a forecasting model using regression analysis from the first dataset and a second dataset, wherein the second dataset includes one or more second dataset values, and wherein each second dataset value of the one or more second dataset values corresponds to each first dataset value of the one or more first dataset values;
generating, by one or more computer hardware processors, a forecasted dataset using the forecasting model, wherein the forecasted dataset includes one or more forecasted dataset values, and wherein each forecasted dataset value of the one or more forecasted dataset values is forecasted using each corresponding first dataset value and second dataset value;
determining, by one or more computer hardware processors, a difference between values in a third dataset and values in the forecasted dataset for each corresponding third dataset value and forecasted dataset value, wherein the third dataset includes one or more third dataset values, and wherein each third dataset value of the one or more third dataset values corresponds to a forecasted dataset value of the one or more forecasted dataset values; and
comparing, by one or more computer hardware processors, the average of the absolute value of all of the difference to a pre-determined inference condition;
wherein;
the first dataset contains a user-requested dataset; and
the second dataset contains a user-known dataset.
1 Assignment
0 Petitions
Accused Products
Abstract
As information becomes more accessible to the public, the ability to predict and estimate sensitive data from the data already available to the general public becomes easier. The existing privacy-preserving data mining approaches only consider the information the user is querying and do not consider the information the user already has, and how the user can use that information in combination with the query information to create sensitive data that the user should not have access to. Some embodiments of the present invention provide a query analysis (QA) program that solves the aforementioned problem by taking into account data that a user may already have, whether it is private data or data that is available to the public, and then using that data, along with the data that would be returned in the query, to determine if sensitive data could be recreated.
18 Citations
5 Claims
-
1. A method for controlling access to a dataset, the method comprising:
-
receiving, by one or more computer hardware processors, a request for a first dataset from a user, wherein the first dataset includes one or more first dataset values; creating, by one or more computer hardware processors, a forecasting model using regression analysis from the first dataset and a second dataset, wherein the second dataset includes one or more second dataset values, and wherein each second dataset value of the one or more second dataset values corresponds to each first dataset value of the one or more first dataset values; generating, by one or more computer hardware processors, a forecasted dataset using the forecasting model, wherein the forecasted dataset includes one or more forecasted dataset values, and wherein each forecasted dataset value of the one or more forecasted dataset values is forecasted using each corresponding first dataset value and second dataset value; determining, by one or more computer hardware processors, a difference between values in a third dataset and values in the forecasted dataset for each corresponding third dataset value and forecasted dataset value, wherein the third dataset includes one or more third dataset values, and wherein each third dataset value of the one or more third dataset values corresponds to a forecasted dataset value of the one or more forecasted dataset values; and comparing, by one or more computer hardware processors, the average of the absolute value of all of the difference to a pre-determined inference condition; wherein; the first dataset contains a user-requested dataset; and the second dataset contains a user-known dataset. - View Dependent Claims (2, 3, 4, 5)
-
Specification