Database anonymization for use in testing database-centric applications
First Claim
1. A method for optimizing anonymization of a database comprising attributes to be used in testing a database-centric application, the method comprising:
- selecting a quasi-identifier attribute from ranked attributes to provide a selected quasi-identifier attribute, wherein each of the ranked attributes is ranked according to a number of statements in code implementing the database-centric application affected by the ranked attribute, wherein for each of the ranked attributes the number of statements in code implementing the database-centric application affected by the ranked attribute is determined by quantifying the number of statements in code implementing the database-centric application affected by the ranked attribute, and wherein quantifying the effect of each of the ranked attributes includes;
for each of the ranked attributes, tainting variables used during execution of the database-centric application and affected by the ranked attribute to provide attribute-specific tainted variables, andfor each of the ranked attributes, determining a number of statements in the database-centric application affected by the ranked attribute based on the attribute-specific tainted variables; and
anonymizing, by a processor, the database based on the selected quasi-identifier attribute to provide a partially anonymized database.
2 Assignments
0 Petitions
Accused Products
Abstract
At least one quasi-identifier attribute of a plurality of ranked attributes is selected for use in anonymizing a database. Each of the ranked attributes is ranked according to that attribute'"'"'s effect on a database-centric application (DCA) being tested. In an embodiment, the selected quasi-identifier attribute(s) has the least effect on the DCA. The database is anonymized based on the selected quasi-identifier attribute(s) to provide a partially-anonymized database, which may then be provided to a testing entity for use in testing the DCA. In an embodiment, during execution of the DCA, instances of database queries are captured and analyzed to identify a plurality of attributes from the database and, for each such attribute identified, the effect of the attribute on the DCA is quantified. In this manner, databases can be selectively anonymized in order to balance the requirements of data privacy against the utility of the data for testing purposes.
50 Citations
19 Claims
-
1. A method for optimizing anonymization of a database comprising attributes to be used in testing a database-centric application, the method comprising:
-
selecting a quasi-identifier attribute from ranked attributes to provide a selected quasi-identifier attribute, wherein each of the ranked attributes is ranked according to a number of statements in code implementing the database-centric application affected by the ranked attribute, wherein for each of the ranked attributes the number of statements in code implementing the database-centric application affected by the ranked attribute is determined by quantifying the number of statements in code implementing the database-centric application affected by the ranked attribute, and wherein quantifying the effect of each of the ranked attributes includes; for each of the ranked attributes, tainting variables used during execution of the database-centric application and affected by the ranked attribute to provide attribute-specific tainted variables, and for each of the ranked attributes, determining a number of statements in the database-centric application affected by the ranked attribute based on the attribute-specific tainted variables; and anonymizing, by a processor, the database based on the selected quasi-identifier attribute to provide a partially anonymized database. - View Dependent Claims (2, 3, 4, 5)
-
-
6. An apparatus for optimizing anonymization of a database to be used in testing a database-centric application, the apparatus comprising:
-
means for receiving information regarding a selected quasi-identifier attribute from ranked attributes, each of the ranked attributes being ranked according to a number of statements in code implementing the database-centric application affected by the ranked attribute, and for each of the ranked attributes the number of statements in code implementing the database-centric application affected by the ranked attribute being determined by quantifying the number of statements in code implementing the database-centric application affected by the ranked attribute, and quantifying the effect of each of the ranked attributes includes; for each of the ranked attributes, tainting variables used during execution of the database-centric application and affected by the ranked attribute to provide attribute-specific tainted variables, and for each of the ranked attributes, determining a number of statements in the database-centric application affected by the ranked attribute based on the attribute-specific tainted variables; and means for anonymizing the database based on the selected quasi-identifier attribute to provide a partially anonymized database. - View Dependent Claims (7)
-
-
8. A method in a processing device for determining effect of attributes within a database on a database-centric application, the method comprising:
-
analyzing, by the processing device, the database-centric application to identify a plurality of attributes used by the database-centric application by capturing instances of database queries during execution of the database-centric application; for each attribute of the plurality of attributes, quantifying, by the processing device, a number of statements in code implementing the database-centric application affected by the attribute, wherein quantifying the effect of each attribute includes; for each attribute of the plurality of attributes, tainting variables used during execution of the database-centric application and affected by the attribute to provide attribute-specific tainted variables, and for each attribute of the plurality of attributes, determining a number of statements in the database-centric application affected by the attribute based on the attribute-specific tainted variables; and ranking, by the processing device, the plurality of attributes according to the number of statements in the code affected by each attribute of the plurality of attributes. - View Dependent Claims (9, 10, 11, 12)
-
-
13. An apparatus for determining effect of attributes within a database on a database-centric application, comprising:
-
a processor; and a storage device, operatively connected to the processor, having stored thereon instructions that, when executed by the processor, cause the processor to; analyze the database-centric application to identify a plurality of attributes used by the database-centric application by capturing instances of database queries during execution of the database-centric application; for each attribute of the plurality of attributes, quantify a number of statements in code implementing the database centric application affected by the attribute, wherein those instructions that cause the processor to quantify the effect of each attribute are operative to; for each attribute of the plurality of attributes, taint variables used during execution of the database-centric application and affected by the attribute to provide attribute-specific tainted variables, and for each attribute of the plurality of attributes, determine a number of statements in the database-centric application affected by the attribute based on the attribute-specific tainted variables; and rank the plurality of attributes according to the number of statements in the code affected by each attribute of the plurality of attributes. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A non-transitory computer readable medium having stored thereon machine readable instructions to optimize anonymization of a database comprising attributes to be used in testing a database-centric application, the machine readable instructions, when executed, cause a processor to:
-
select a quasi-identifier attribute from ranked attributes to provide a selected quasi-identifier attribute, wherein each of the ranked attributes is ranked according to a number of statements in code implementing the database-centric application affected by the ranked attribute, wherein for each of the ranked attributes the number of statements in code implementing the database-centric application affected by the ranked attribute is determined by quantifying the number of statements in code implementing the database-centric application affected by the ranked attribute, and wherein quantifying the effect of each of the ranked attributes includes; for each of the ranked attributes, tainting variables used during execution of the database-centric application and affected by the ranked attribute to provide attribute-specific tainted variables, and for each of the ranked attributes, determining a number of statements in the database-centric application affected by the ranked attribute based on the attribute-specific tainted variables; and anonymize the database based on the selected quasi-identifier attribute to provide a partially anonymized database.
-
-
19. A non-transitory computer readable medium having stored thereon machine readable instructions to determine effect of attributes within a database on a database-centric application, the machine readable instructions, when executed, cause a processor to:
-
analyze the database-centric application to identify a plurality of attributes used by the database-centric application by capturing instances of database queries during execution of the database-centric application; for each attribute of the plurality of attributes, quantify a number of statements in code implementing the database-centric application affected by the attribute, wherein quantifying the effect of each attribute includes; for each attribute of the plurality of attributes, tainting variables used during execution of the database-centric application and affected by the attribute to provide attribute-specific tainted variables, and for each attribute of the plurality of attributes, determining a number of statements in the database-centric application affected by the attribute based on the attribute-specific tainted variables; and rank the plurality of attributes according to the number of statements in the code affected by each attribute of the plurality of attributes.
-
Specification