System for and method of controllably disclosing sensitive data

US 10,210,346 B2
Filed: 07/14/2017
Issued: 02/19/2019
Est. Priority Date: 09/08/2014
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of ensuring selective disclosure of sensitive data, comprising the steps performed by one or more processors of:

receiving at least one policy comprised of policy variables indicating what data items are sensitive, what data items are disclosable, validity conditions for a candidate disclosure dataset to be believable by a recipient, and sufficiency conditions specifying an extent of variability necessary among data objects in a candidate disclosure dataset to protect the sensitive data, and optionally one or more sets of truth data items;

if one or more sets of truth data items were received, auditing the one or more sets of truth data items for compliance with the at least one policy, and if the one or more sets of truth data items fails to comply with the at least one policy, or if no sets of truth data items were received, producing a collection of synthetic dataset disclosure possibilities meeting the validity conditions;

if any synthetic dataset disclosure possibilities are produced, producing one or more associations among the policy variables and each of the one or more sets of truth data items, if any received, and each of the synthetic dataset disclosure possibilities meeting the validity conditions;

if one or more sets of truth data items were received and any synthetic dataset disclosure possibilities produced, generating at least one candidate disclosure dataset comprising at least one of the sets of truth data items and at least one of the synthetic datasets disclosure possibilities; and

repeating the producing steps and the generating step until the at least one candidate disclosure dataset whose associations meet the validity conditions, meets the sufficiency conditions or until a determination is made that the sufficiency conditions cannot be met;

storing or transmitting on a tangible medium data associated with the at least one candidate disclosure dataset resulting from the repeated producing and generating steps; and

at least one of generating an output indicating a compliance status with respect to the at least one policy, generating a certificate indicating that the at least one candidate disclosure dataset complies with the at least one policy, or providing the at least one compliant candidate disclosure dataset to a recipient, or requesting approval from a holder of the sensitive data to disclose the at least one compliant candidate disclosure dataset, or if the at least one candidate disclosure dataset is determined not to comply with the at least one policy, attempting to modifying the at least one candidate disclosure dataset to be compliant with the at least one policy.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

System and method of producing a collection of possibilities that agree on information that must be disclosed (disclosable information) and disagree with a sufficient degree of diversity as defined by a policy to protect the sensitive information. A policy defines: what information is possible, what information the recipient would believe what information is sensitive (to protect), what information is disclosable (to share) and sufficiency conditions that specify the degree of ambiguity required to consider the sensitive information protected. A formalism is utilized that provably achieves these goals for a variety of structured datasets including tabular data such as spreadsheets or databases as well as annotated graphs. The formalism includes the ability to generate a certificate that proves a disclosure adheres to a policy. This certificate is produced either as part of the protection process or separately using an altered process.

Citations

18 Claims

1. A computer-implemented method of ensuring selective disclosure of sensitive data, comprising the steps performed by one or more processors of:
- receiving at least one policy comprised of policy variables indicating what data items are sensitive, what data items are disclosable, validity conditions for a candidate disclosure dataset to be believable by a recipient, and sufficiency conditions specifying an extent of variability necessary among data objects in a candidate disclosure dataset to protect the sensitive data, and optionally one or more sets of truth data items;
  
  if one or more sets of truth data items were received, auditing the one or more sets of truth data items for compliance with the at least one policy, and if the one or more sets of truth data items fails to comply with the at least one policy, or if no sets of truth data items were received, producing a collection of synthetic dataset disclosure possibilities meeting the validity conditions;
  
  if any synthetic dataset disclosure possibilities are produced, producing one or more associations among the policy variables and each of the one or more sets of truth data items, if any received, and each of the synthetic dataset disclosure possibilities meeting the validity conditions;
  
  if one or more sets of truth data items were received and any synthetic dataset disclosure possibilities produced, generating at least one candidate disclosure dataset comprising at least one of the sets of truth data items and at least one of the synthetic datasets disclosure possibilities; and
  
  repeating the producing steps and the generating step until the at least one candidate disclosure dataset whose associations meet the validity conditions, meets the sufficiency conditions or until a determination is made that the sufficiency conditions cannot be met;
  
  storing or transmitting on a tangible medium data associated with the at least one candidate disclosure dataset resulting from the repeated producing and generating steps; and
  
  at least one of generating an output indicating a compliance status with respect to the at least one policy, generating a certificate indicating that the at least one candidate disclosure dataset complies with the at least one policy, or providing the at least one compliant candidate disclosure dataset to a recipient, or requesting approval from a holder of the sensitive data to disclose the at least one compliant candidate disclosure dataset, or if the at least one candidate disclosure dataset is determined not to comply with the at least one policy, attempting to modifying the at least one candidate disclosure dataset to be compliant with the at least one policy.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein if the sufficiency conditions or validity conditions are not met, the optionally producing step comprises iteratively adding a synthetic dataset disclosure possibility to the collection.
  - 3. The method of claim 1, wherein the at least one processor produces the one or more associations in a random manner or in parallel.
  - 4. The method of claim 1, wherein the one or more processors optimize at least one of the policy or the one or more truth data sets.
  - 5. The method of claim 1, wherein the one or more processors produce the collection of synthetic dataset disclosure possibilities by assigning to the policy variables values that adhere to at least one selection of a distribution function and a cardinality.
  - 6. The method of claim 1, wherein the respective values of the collection of synthetic dataset disclosure possibilities disagree by at least the extent specified in the sufficiency conditions.
  - 7. The method of claim 1, wherein:
    - the at least one policy comprises a plurality of policies representing dissimilar requirements with respect to providing and protecting sensitive data; and
      
      the one or more processors reconcile the dissimilar requirements.

8. A computer-implemented method of ensuring selective disclosure of sensitive data, comprising the steps performed by one or more processors of:
- receiving at least one policy comprised of policy variables indicating what data items are sensitive, what data items are disclosable, validity conditions for a candidate disclosure dataset to be believable by a recipient, and sufficiency conditions specifying an extent of variability necessary among data objects in a candidate disclosure dataset to protect the sensitive data, and one or more sets of truth data items;
  
  auditing the one or more sets of truth data items for compliance with the at least one policy, and if the one or more sets of truth data items fails to comply with the at least one policy, producing a collection of synthetic dataset disclosure possibilities meeting the validity conditions;
  
  if any synthetic dataset disclosure possibilities are produced, producing one or more associations between the policy variables and each of the synthetic dataset disclosure possibilities meeting the validity conditions;
  
  if any synthetic dataset disclosure possibilities are produced, generating at least one candidate disclosure dataset comprising at least one of the synthetic datasets disclosure possibilities, wherein the at least one candidate disclosure dataset is constrained so as to not include the one or more sets of truth data items; and
  
  repeating the producing steps and the generating step until the at least one candidate disclosure dataset whose associations meet the validity conditions, meets the sufficiency conditions or until a determination is made that the sufficiency conditions cannot be met;
  
  storing or transmitting on a tangible medium data associated with the at least one candidate disclosure dataset resulting from the repeated producing and generating steps; and
  
  if the at least one candidate disclosure dataset is determined to comply with the at least one policy, performing at least one of generating an output indicating a compliance status with respect to the at least one policy, or generating a certificate indicating that the at least one candidate disclosure dataset complies with the at least one policy, or providing the at least one compliant candidate disclosure dataset to a recipient, or requesting approval from a holder of the sensitive data to disclose the at least one complaint candidate disclosure dataset, or if the at least one candidate disclosure dataset is determined not to comply with the at least one policy, attempting to modify the at least one candidate disclosure dataset to be compliant with the at least one policy.

9. A computer-implemented method of auditing for compliance to at least one policy a set of previously instantiated datasets of selectively disclosable sensitive data, comprising the steps performed by one or more processors of:
- receivingat least one policy comprised of policy variables indicating what data items are sensitive, what data items are disclosable, validity conditions for a candidate disclosure dataset to be believable by a recipient, and sufficiency conditions specifying an extent of variability necessary among data objects in a candidate disclosure dataset to protect the sensitive data, andat least one candidate disclosure dataset;
  
  determining whether the at least one candidate disclosure dataset complies with the validity conditions, and generating one or more associations between the policy variables and the one or more truth data sets and the synthetic dataset disclosure set possibilities meeting the validity conditions;
  
  determining whether the at least one candidate disclosure dataset whose associations meet the validity conditions, meets the sufficiency conditions; and
  
  storing or transmitting on a tangible medium data associated with the at least one candidate disclosure dataset resulting from the generating step(s); and
  
  if the at least one candidate disclosure dataset is determined to comply with the at least one policy, performing at least one of generating an output indicating a compliance status with respect to the at least one policy, or generating a certificate indicating that the at least one candidate disclosure dataset complies with the at least one policy, or providing the at least one compliant candidate disclosure dataset to a recipient, or requesting approval from a holder of the sensitive data to disclose the at least one compliant candidate disclosure dataset, or if the at least one candidate disclosure dataset is determined not to comply with the at least one policy, attempting to modifying the at least one candidate disclosure dataset to be compliant with the at least one policy.

10. A computer-implemented system for matching sensitive truth datasets, comprising:
- at least one processor configured to;
  
  store in memory a corresponding first and second dataset each including one or more truth datasets comprised of truth dataset elements for one or more data fields;
  
  expand the first and second datasets to include one or more fictitious dataset elements for the one or more data fields;
  
  generate for and associate to each truth dataset element and fictitious dataset element an associated authenticity code using corresponding first and second authenticity functions, wherein each authenticity function produces for each truth dataset element a consistent authenticity code for a given dataset element input to the authenticity function that is distinct from authenticity codes produced for fictitious dataset elements;
  
  share with a matching unit the expanded first and second datasets with the generated associated authenticity codes, a matching function, andthe first and second authenticity functions;
  
  cause the matching unit toapply to the shared extended first and second datasets the matching function in order to determine at least one indication of the likelihood of a match occurrence between elements of the shared expanded first and second datasets, andverify using the shared first and second authenticity functions the shared associated authenticity codes to generate at least one authenticity determination related to the shared expanded first and second datasets;
  
  determine whether one or more authentic truth dataset matches have occurred based on correspondence between the at least one indication of the likelihood of a match and the at least one authenticity determination; and
  
  store or transmit on a tangible medium data associated with the truth dataset match determination, andif no authentic truth dataset matches are determined to have occurred, generating an output including at least one of an indication that no truth dataset match was found, an indication of a proximity to a truth dataset match, or characteristic information regarding the first and second datasets, andif at least one authentic truth dataset match is determined to have occurred, generating an output including at least one of a report of the existence of an authentic truth dataset match, a count of the number of authentic truth dataset matches, an approximate number of authentic truth dataset matches, an indication of duplicate dataset entries, an aggregation of the matching criteria and truth dataset entries, a computed value for a data field entry associated with the authentic truth dataset match, an indication of which truth datasets did not match, an indication of a proximity to additional authentic truth dataset matches, and/or at least one portion of a matching authentic truth dataset.
- View Dependent Claims (11)
- - 11. The computer-implemented system of claim 10, wherein the likelihood of a match output by the matching function comprises a true indication, a false indication, or a score.

12. A computer-implemented system for defining and implementing selecting criteria for sensitive data disclosure, comprising:
- at least one processor configured to;
  
  store in memory a dataset including one or more truth dataset elements for corresponding one or more data fields, and an authenticity function;
  
  expand the dataset to include one or more fictitious dataset elements for the one or more data fields;
  
  generate for and associate to each truth dataset element and fictitious dataset element an authenticity code using the authenticity function, wherein the authenticity function produces for a particular truth dataset elements a consistent authenticity code for a given dataset element input to the authenticity function that is distinct from authenticity codes produced for the fictitious dataset elements;
  
  share with a matching unit the expanded dataset with or without the associated authenticity codes;
  
  if the associated authenticity codes are shared with the matching unit, share the authenticity function with the matching unit, otherwise share with an authenticity unit the authenticity function;
  
  receive at the matching unit selecting criteria specifying values to evaluate a predicate of the data fields of the shared expanded dataset;
  
  cause the matching unit to apply the selecting criteria to the one or more shared expanded dataset elements, and generate an indication of a likelihood of a match occurrence between the selecting criteria and data field entries in the shared expanded dataset;
  
  cause the matching unit or the authenticity unit, depending on which has been shared the authenticity function, to verify using the authenticity function at least the authentication codes associated with the shared expanded dataset elements for which the matching unit has generated a likelihood of a match occurrence, in order to identify truth dataset element authentication codes;
  
  determine whether one or more authentic truth dataset match has occurred by identifying the indications of likelihood of a match between the selecting criteria and data field entries in the shared expanded dataset that correspond to truth dataset element authentication codes; and
  
  store or transmit on a tangible medium data associated with the authentic truth dataset match determination, further comprisingif no authentic truth dataset matches are determined to have occurred, generating at least one output including an indication that no truth dataset match was found, an indication of a proximity to a truth dataset match, or characteristic information regarding the truth datasets, andif at least one authentic truth dataset match is determined to have occurred, generating an output including at least one of a report of the existence of a truth dataset match, a count of the number of truth dataset matches, an approximate number of truth dataset matches, an indication of which truth datasets did not match, an indication of a proximity to additional truth dataset matches, and/or at least one portion of a matching truth dataset.
- View Dependent Claims (13, 14)
- - 13. The computer-implemented system of claim 12, wherein the at least one processor is further configured to receive the selecting criteria from a querying entity.
  - 14. The computer-implemented system of claim 12, wherein the at least one processor is further configured to output at least one of an aggregation of the matching criteria and entries, a computed value for a data field entry associated with the authentic truth dataset match, and an indication of duplicate entries.

15. A computer-implemented method for recombining datasets to re-identify unprotected truth data, comprising the steps performed by at least one processor of:
- receiving a plurality of disclosed datasets, at least an indeterminate one of the disclosed datasets including protected truth data formed from the unprotected truth data by transformation;
  
  using at least one of the disclosed datasets, constructing at least one description of a set of unprotected truth data possibilities for unprotected truth data items in said plurality of disclosed datasets;
  
  determining whether one or more datasets of the plurality that was not used in constructing the at least one description of the set of unprotected truth data possibilities satisfies the at least one description; and
  
  forming an inference regarding the unprotected truth data possibilities based on the one or more satisfaction determination; and
  
  storing or transmitting on a tangible medium data associated with the unprotected truth data possibilities inference, further comprisingif one or more datasets not used in constructing the at least one description fails to satisfy the description, generating an output including at least one of an indication of an inability to re-identify the unprotected truth data, and a indication of a proximity to satisfying the description, andif the one or more datasets not used in constructing the at least one description satisfies the description, generating an output reporting at least one of the ability to re-identify the unprotected truth data, at least a portion of the unprotected truth data items re-identified by inference to comprise the initially disclosed truth data based on satisfying the at least one description, a measure of an extent of refinement of unprotected truth data possible from the protected truth data, an indication of a proximity to additional dataset matches satisfying the description, and an indication of refinements possible of unprotected truth data possible from the protected truth data.
- View Dependent Claims (16)
- - 16. The method of claim 15, further comprising interpreting for output the at least one description that is satisfied by each of the disclosed datasets in the plurality of disclosed datasets.

17. A computer-implemented system for matching sensitive truth datasets, comprising:
- at least one processor configured to;
  
  store in memory a corresponding first and second dataset each including one or more truth datasets comprised of truth dataset elements for one or more data fields;
  
  expand the first and second datasets to include one or more fictitious dataset elements for the one or more data fields;
  
  generate for and associate to each truth dataset element and fictitious dataset element an associated authenticity code using at least one authenticity function, wherein each authenticity function produces for each truth dataset element a consistent authenticity code for a given dataset element input to the authenticity function that is distinct from authenticity codes produced for fictitious dataset elements, wherein the authenticity function producing authenticity codes for the first dataset may be distinct from the authenticity function producing authenticity codes for the second dataset;
  
  share with a matching unit the expanded first and second datasets and a matching function;
  
  share with at least one authenticity unit the expanded first and second datasets with associated authenticity codes, and the corresponding at least one authenticity function;
  
  cause the matching unit to apply to the shared expanded first and second datasets a matching function and output as a result of the matching function application at least one indication of the likelihood of a match occurrence between elements of the shared expanded first and second datasets;
  
  cause the at least one authenticity unit to apply the corresponding first at least one authenticity function to the shared expanded first and second datasets with associated authenticity codes to output at least one authenticity determination;
  
  determine whether one or more authentic truth dataset matches have occurred based on correspondence between the at least one indication of the likelihood of a match and the at least one authenticity determination; and
  
  store or transmit on a tangible medium data associated with the truth dataset match determination, further comprisingif no authentic truth dataset matches are determined to have occurred, generating an output including at least one of an indication that no authentic truth dataset match was found, an indication of a proximity to an authentic truth dataset match, or characteristic information regarding the first and second datasets, andif at least one authentic truth dataset match is determined to have occurred, generating an output including at least one of a report of the existence of an authentic truth dataset match, a count of the number of authentic truth dataset matches, an approximate number of authentic truth dataset matches, an indication of duplicate dataset entries, an aggregation of the matching criteria and truth dataset entries, a computed value for a data field entry associated with the authentic truth dataset match, an indication of which truth datasets did not match, an indication of a proximity to additional authentic truth dataset matches, and at least one portion of a matching authentic truth dataset.
- View Dependent Claims (18)
- - 18. The computer-implemented system of claim 17, wherein the likelihood of a match output by the matching function comprises a true indication, a false indication, or a score.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SybilSecurity IP, LLC
Original Assignee
SybilSecurity IP, LLC
Inventors
Braun, Uri Jacob
Primary Examiner(s)
Hirl, Joseph P
Assistant Examiner(s)
Gundry, Stephen T

Application Number

US15/650,500
Publication Number

US 20170337398A1
Time in Patent Office

585 Days
Field of Search
US Class Current
CPC Class Codes

G06F 21/6254 by anonymising data, e.g. d...

G06F 2221/2101 Auditing as a secondary aspect

System for and method of controllably disclosing sensitive data

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

System for and method of controllably disclosing sensitive data

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links