System for and method of controllably disclosing sensitive data
First Claim
1. A computer-implemented method of ensuring selective disclosure of sensitive data, comprising the steps performed by one or more processors of:
- receiving at least one policy comprised of policy variables indicating what data items are sensitive, what data items are disclosable, validity conditions for a candidate disclosure dataset to be believable by a recipient, and sufficiency conditions specifying an extent of variability necessary among data objects in a candidate disclosure dataset to protect the sensitive data, and optionally one or more sets of truth data items;
if one or more sets of truth data items were received, auditing the one or more sets of truth data items for compliance with the at least one policy, and if the one or more sets of truth data items fails to comply with the at least one policy, or if no sets of truth data items were received, producing a collection of synthetic dataset disclosure possibilities meeting the validity conditions;
if any synthetic dataset disclosure possibilities are produced, producing one or more associations among the policy variables and each of the one or more sets of truth data items, if any received, and each of the synthetic dataset disclosure possibilities meeting the validity conditions;
if one or more sets of truth data items were received and any synthetic dataset disclosure possibilities produced, generating at least one candidate disclosure dataset comprising at least one of the sets of truth data items and at least one of the synthetic datasets disclosure possibilities; and
repeating the producing steps and the generating step until the at least one candidate disclosure dataset whose associations meet the validity conditions, meets the sufficiency conditions or until a determination is made that the sufficiency conditions cannot be met;
storing or transmitting on a tangible medium data associated with the at least one candidate disclosure dataset resulting from the repeated producing and generating steps; and
at least one of generating an output indicating a compliance status with respect to the at least one policy, generating a certificate indicating that the at least one candidate disclosure dataset complies with the at least one policy, or providing the at least one compliant candidate disclosure dataset to a recipient, or requesting approval from a holder of the sensitive data to disclose the at least one compliant candidate disclosure dataset, or if the at least one candidate disclosure dataset is determined not to comply with the at least one policy, attempting to modifying the at least one candidate disclosure dataset to be compliant with the at least one policy.
1 Assignment
0 Petitions
Accused Products
Abstract
System and method of producing a collection of possibilities that agree on information that must be disclosed (disclosable information) and disagree with a sufficient degree of diversity as defined by a policy to protect the sensitive information. A policy defines: what information is possible, what information the recipient would believe what information is sensitive (to protect), what information is disclosable (to share) and sufficiency conditions that specify the degree of ambiguity required to consider the sensitive information protected. A formalism is utilized that provably achieves these goals for a variety of structured datasets including tabular data such as spreadsheets or databases as well as annotated graphs. The formalism includes the ability to generate a certificate that proves a disclosure adheres to a policy. This certificate is produced either as part of the protection process or separately using an altered process.
-
Citations
18 Claims
-
1. A computer-implemented method of ensuring selective disclosure of sensitive data, comprising the steps performed by one or more processors of:
-
receiving at least one policy comprised of policy variables indicating what data items are sensitive, what data items are disclosable, validity conditions for a candidate disclosure dataset to be believable by a recipient, and sufficiency conditions specifying an extent of variability necessary among data objects in a candidate disclosure dataset to protect the sensitive data, and optionally one or more sets of truth data items; if one or more sets of truth data items were received, auditing the one or more sets of truth data items for compliance with the at least one policy, and if the one or more sets of truth data items fails to comply with the at least one policy, or if no sets of truth data items were received, producing a collection of synthetic dataset disclosure possibilities meeting the validity conditions; if any synthetic dataset disclosure possibilities are produced, producing one or more associations among the policy variables and each of the one or more sets of truth data items, if any received, and each of the synthetic dataset disclosure possibilities meeting the validity conditions; if one or more sets of truth data items were received and any synthetic dataset disclosure possibilities produced, generating at least one candidate disclosure dataset comprising at least one of the sets of truth data items and at least one of the synthetic datasets disclosure possibilities; and repeating the producing steps and the generating step until the at least one candidate disclosure dataset whose associations meet the validity conditions, meets the sufficiency conditions or until a determination is made that the sufficiency conditions cannot be met; storing or transmitting on a tangible medium data associated with the at least one candidate disclosure dataset resulting from the repeated producing and generating steps; and at least one of generating an output indicating a compliance status with respect to the at least one policy, generating a certificate indicating that the at least one candidate disclosure dataset complies with the at least one policy, or providing the at least one compliant candidate disclosure dataset to a recipient, or requesting approval from a holder of the sensitive data to disclose the at least one compliant candidate disclosure dataset, or if the at least one candidate disclosure dataset is determined not to comply with the at least one policy, attempting to modifying the at least one candidate disclosure dataset to be compliant with the at least one policy. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-implemented method of ensuring selective disclosure of sensitive data, comprising the steps performed by one or more processors of:
-
receiving at least one policy comprised of policy variables indicating what data items are sensitive, what data items are disclosable, validity conditions for a candidate disclosure dataset to be believable by a recipient, and sufficiency conditions specifying an extent of variability necessary among data objects in a candidate disclosure dataset to protect the sensitive data, and one or more sets of truth data items; auditing the one or more sets of truth data items for compliance with the at least one policy, and if the one or more sets of truth data items fails to comply with the at least one policy, producing a collection of synthetic dataset disclosure possibilities meeting the validity conditions; if any synthetic dataset disclosure possibilities are produced, producing one or more associations between the policy variables and each of the synthetic dataset disclosure possibilities meeting the validity conditions; if any synthetic dataset disclosure possibilities are produced, generating at least one candidate disclosure dataset comprising at least one of the synthetic datasets disclosure possibilities, wherein the at least one candidate disclosure dataset is constrained so as to not include the one or more sets of truth data items; and repeating the producing steps and the generating step until the at least one candidate disclosure dataset whose associations meet the validity conditions, meets the sufficiency conditions or until a determination is made that the sufficiency conditions cannot be met; storing or transmitting on a tangible medium data associated with the at least one candidate disclosure dataset resulting from the repeated producing and generating steps; and if the at least one candidate disclosure dataset is determined to comply with the at least one policy, performing at least one of generating an output indicating a compliance status with respect to the at least one policy, or generating a certificate indicating that the at least one candidate disclosure dataset complies with the at least one policy, or providing the at least one compliant candidate disclosure dataset to a recipient, or requesting approval from a holder of the sensitive data to disclose the at least one complaint candidate disclosure dataset, or if the at least one candidate disclosure dataset is determined not to comply with the at least one policy, attempting to modify the at least one candidate disclosure dataset to be compliant with the at least one policy.
-
-
9. A computer-implemented method of auditing for compliance to at least one policy a set of previously instantiated datasets of selectively disclosable sensitive data, comprising the steps performed by one or more processors of:
-
receiving at least one policy comprised of policy variables indicating what data items are sensitive, what data items are disclosable, validity conditions for a candidate disclosure dataset to be believable by a recipient, and sufficiency conditions specifying an extent of variability necessary among data objects in a candidate disclosure dataset to protect the sensitive data, and at least one candidate disclosure dataset; determining whether the at least one candidate disclosure dataset complies with the validity conditions, and generating one or more associations between the policy variables and the one or more truth data sets and the synthetic dataset disclosure set possibilities meeting the validity conditions; determining whether the at least one candidate disclosure dataset whose associations meet the validity conditions, meets the sufficiency conditions; and storing or transmitting on a tangible medium data associated with the at least one candidate disclosure dataset resulting from the generating step(s); and if the at least one candidate disclosure dataset is determined to comply with the at least one policy, performing at least one of generating an output indicating a compliance status with respect to the at least one policy, or generating a certificate indicating that the at least one candidate disclosure dataset complies with the at least one policy, or providing the at least one compliant candidate disclosure dataset to a recipient, or requesting approval from a holder of the sensitive data to disclose the at least one compliant candidate disclosure dataset, or if the at least one candidate disclosure dataset is determined not to comply with the at least one policy, attempting to modifying the at least one candidate disclosure dataset to be compliant with the at least one policy.
-
-
10. A computer-implemented system for matching sensitive truth datasets, comprising:
-
at least one processor configured to; store in memory a corresponding first and second dataset each including one or more truth datasets comprised of truth dataset elements for one or more data fields; expand the first and second datasets to include one or more fictitious dataset elements for the one or more data fields; generate for and associate to each truth dataset element and fictitious dataset element an associated authenticity code using corresponding first and second authenticity functions, wherein each authenticity function produces for each truth dataset element a consistent authenticity code for a given dataset element input to the authenticity function that is distinct from authenticity codes produced for fictitious dataset elements; share with a matching unit the expanded first and second datasets with the generated associated authenticity codes, a matching function, and the first and second authenticity functions; cause the matching unit to apply to the shared extended first and second datasets the matching function in order to determine at least one indication of the likelihood of a match occurrence between elements of the shared expanded first and second datasets, and verify using the shared first and second authenticity functions the shared associated authenticity codes to generate at least one authenticity determination related to the shared expanded first and second datasets; determine whether one or more authentic truth dataset matches have occurred based on correspondence between the at least one indication of the likelihood of a match and the at least one authenticity determination; and store or transmit on a tangible medium data associated with the truth dataset match determination, and if no authentic truth dataset matches are determined to have occurred, generating an output including at least one of an indication that no truth dataset match was found, an indication of a proximity to a truth dataset match, or characteristic information regarding the first and second datasets, and if at least one authentic truth dataset match is determined to have occurred, generating an output including at least one of a report of the existence of an authentic truth dataset match, a count of the number of authentic truth dataset matches, an approximate number of authentic truth dataset matches, an indication of duplicate dataset entries, an aggregation of the matching criteria and truth dataset entries, a computed value for a data field entry associated with the authentic truth dataset match, an indication of which truth datasets did not match, an indication of a proximity to additional authentic truth dataset matches, and/or at least one portion of a matching authentic truth dataset. - View Dependent Claims (11)
-
-
12. A computer-implemented system for defining and implementing selecting criteria for sensitive data disclosure, comprising:
at least one processor configured to; store in memory a dataset including one or more truth dataset elements for corresponding one or more data fields, and an authenticity function; expand the dataset to include one or more fictitious dataset elements for the one or more data fields; generate for and associate to each truth dataset element and fictitious dataset element an authenticity code using the authenticity function, wherein the authenticity function produces for a particular truth dataset elements a consistent authenticity code for a given dataset element input to the authenticity function that is distinct from authenticity codes produced for the fictitious dataset elements; share with a matching unit the expanded dataset with or without the associated authenticity codes; if the associated authenticity codes are shared with the matching unit, share the authenticity function with the matching unit, otherwise share with an authenticity unit the authenticity function; receive at the matching unit selecting criteria specifying values to evaluate a predicate of the data fields of the shared expanded dataset; cause the matching unit to apply the selecting criteria to the one or more shared expanded dataset elements, and generate an indication of a likelihood of a match occurrence between the selecting criteria and data field entries in the shared expanded dataset; cause the matching unit or the authenticity unit, depending on which has been shared the authenticity function, to verify using the authenticity function at least the authentication codes associated with the shared expanded dataset elements for which the matching unit has generated a likelihood of a match occurrence, in order to identify truth dataset element authentication codes; determine whether one or more authentic truth dataset match has occurred by identifying the indications of likelihood of a match between the selecting criteria and data field entries in the shared expanded dataset that correspond to truth dataset element authentication codes; and store or transmit on a tangible medium data associated with the authentic truth dataset match determination, further comprising if no authentic truth dataset matches are determined to have occurred, generating at least one output including an indication that no truth dataset match was found, an indication of a proximity to a truth dataset match, or characteristic information regarding the truth datasets, and if at least one authentic truth dataset match is determined to have occurred, generating an output including at least one of a report of the existence of a truth dataset match, a count of the number of truth dataset matches, an approximate number of truth dataset matches, an indication of which truth datasets did not match, an indication of a proximity to additional truth dataset matches, and/or at least one portion of a matching truth dataset. - View Dependent Claims (13, 14)
-
15. A computer-implemented method for recombining datasets to re-identify unprotected truth data, comprising the steps performed by at least one processor of:
-
receiving a plurality of disclosed datasets, at least an indeterminate one of the disclosed datasets including protected truth data formed from the unprotected truth data by transformation; using at least one of the disclosed datasets, constructing at least one description of a set of unprotected truth data possibilities for unprotected truth data items in said plurality of disclosed datasets; determining whether one or more datasets of the plurality that was not used in constructing the at least one description of the set of unprotected truth data possibilities satisfies the at least one description; and forming an inference regarding the unprotected truth data possibilities based on the one or more satisfaction determination; and storing or transmitting on a tangible medium data associated with the unprotected truth data possibilities inference, further comprising if one or more datasets not used in constructing the at least one description fails to satisfy the description, generating an output including at least one of an indication of an inability to re-identify the unprotected truth data, and a indication of a proximity to satisfying the description, and if the one or more datasets not used in constructing the at least one description satisfies the description, generating an output reporting at least one of the ability to re-identify the unprotected truth data, at least a portion of the unprotected truth data items re-identified by inference to comprise the initially disclosed truth data based on satisfying the at least one description, a measure of an extent of refinement of unprotected truth data possible from the protected truth data, an indication of a proximity to additional dataset matches satisfying the description, and an indication of refinements possible of unprotected truth data possible from the protected truth data. - View Dependent Claims (16)
-
-
17. A computer-implemented system for matching sensitive truth datasets, comprising:
-
at least one processor configured to; store in memory a corresponding first and second dataset each including one or more truth datasets comprised of truth dataset elements for one or more data fields; expand the first and second datasets to include one or more fictitious dataset elements for the one or more data fields; generate for and associate to each truth dataset element and fictitious dataset element an associated authenticity code using at least one authenticity function, wherein each authenticity function produces for each truth dataset element a consistent authenticity code for a given dataset element input to the authenticity function that is distinct from authenticity codes produced for fictitious dataset elements, wherein the authenticity function producing authenticity codes for the first dataset may be distinct from the authenticity function producing authenticity codes for the second dataset; share with a matching unit the expanded first and second datasets and a matching function; share with at least one authenticity unit the expanded first and second datasets with associated authenticity codes, and the corresponding at least one authenticity function; cause the matching unit to apply to the shared expanded first and second datasets a matching function and output as a result of the matching function application at least one indication of the likelihood of a match occurrence between elements of the shared expanded first and second datasets; cause the at least one authenticity unit to apply the corresponding first at least one authenticity function to the shared expanded first and second datasets with associated authenticity codes to output at least one authenticity determination; determine whether one or more authentic truth dataset matches have occurred based on correspondence between the at least one indication of the likelihood of a match and the at least one authenticity determination; and store or transmit on a tangible medium data associated with the truth dataset match determination, further comprising if no authentic truth dataset matches are determined to have occurred, generating an output including at least one of an indication that no authentic truth dataset match was found, an indication of a proximity to an authentic truth dataset match, or characteristic information regarding the first and second datasets, and if at least one authentic truth dataset match is determined to have occurred, generating an output including at least one of a report of the existence of an authentic truth dataset match, a count of the number of authentic truth dataset matches, an approximate number of authentic truth dataset matches, an indication of duplicate dataset entries, an aggregation of the matching criteria and truth dataset entries, a computed value for a data field entry associated with the authentic truth dataset match, an indication of which truth datasets did not match, an indication of a proximity to additional authentic truth dataset matches, and at least one portion of a matching authentic truth dataset. - View Dependent Claims (18)
-
Specification