Private information storage system
First Claim
1. A computer-implemented method of anonymising a database of personal data, the database comprising a plurality of data records, each data record comprising a plurality of data items, the method comprising;
- for a subset of data items in said data records, determining a deviation of each of said data items in said data records relative to reference data items in a plurality of reference records,wherein one of said plurality of reference records is selected for each one of said data items or subset of data items dependent on a similarity of a said data record to said reference records, wherein determining said similarity comprises;
categorizing said data items in said data records into a plurality of pools based on classification profiles defined by said reference records, wherein a data item similarity of data items in a said pool is above a threshold; and
comparing calculated perturbation profiles of one or more of said data items in a said pool with one or more of said reference data items of said reference records,wherein each of said data items in said data records has a corresponding said reference data item in a said selected reference record according to a said classification profile to determine a said deviation of a said data item relative to a said reference data item in a said selected reference record,assigning deviation identifiers to each of said determined deviations in said data records to identify a said data item being recorded as a said determined deviation to a said reference data item and to anonymise said data items in said subset of data items in said data records;
generating a translation table mapping said data items in said subset and said determined deviations to said deviation identifiers;
storing said translation table; and
storing said deviation identifiers defining said anonymised data items for said data records remotely to said translation table.
3 Assignments
0 Petitions
Accused Products
Abstract
This invention relates to a scheme for storage of private information on a cloud computing platform without contravention of territorial privacy laws. A method of anonymising a database of personal data is described whereby data identifiers are assigned to data items and deviation identifiers are assigned to deviations for selected data items derived from reference records. Such information can then be uploaded to a cloud based storage platform. A translation table maps the data items, data identifiers and deviation identifiers to the original data entries. This translation table is stored locally and separate to the anonymised information uploaded to the cloud. The invention further describes a method of decoding the database anonymised according to the above method.
301 Citations
25 Claims
-
1. A computer-implemented method of anonymising a database of personal data, the database comprising a plurality of data records, each data record comprising a plurality of data items, the method comprising;
-
for a subset of data items in said data records, determining a deviation of each of said data items in said data records relative to reference data items in a plurality of reference records, wherein one of said plurality of reference records is selected for each one of said data items or subset of data items dependent on a similarity of a said data record to said reference records, wherein determining said similarity comprises; categorizing said data items in said data records into a plurality of pools based on classification profiles defined by said reference records, wherein a data item similarity of data items in a said pool is above a threshold; and comparing calculated perturbation profiles of one or more of said data items in a said pool with one or more of said reference data items of said reference records, wherein each of said data items in said data records has a corresponding said reference data item in a said selected reference record according to a said classification profile to determine a said deviation of a said data item relative to a said reference data item in a said selected reference record, assigning deviation identifiers to each of said determined deviations in said data records to identify a said data item being recorded as a said determined deviation to a said reference data item and to anonymise said data items in said subset of data items in said data records; generating a translation table mapping said data items in said subset and said determined deviations to said deviation identifiers; storing said translation table; and storing said deviation identifiers defining said anonymised data items for said data records remotely to said translation table. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer-implemented method of decoding a database of personal data, wherein said database of personal data is anonymised, the database comprising a plurality of data records, each data record comprising a plurality of data items, the method comprising:
-
retrieving deviation identifiers for a subset of said data items in remote data record from remote storage, wherein said deviation identifiers define anonymised deviations for each of said data items in said data records relative to reference data items in a plurality of reference records, wherein one of said plurality of reference records is selected for each one of said data items or subset of data items dependent on a similarity of a said record to said data record to said reference records, wherein determining said similarity comprises; categorizing said data items in said data records into a plurality of pools based on classification profiles defined by said reference records, wherein a data item similarity of data items in a said pool is above a threshold; and comparing calculated perturbation profiles of one or more of said data items in a said pool with one or more of said reference data items of said reference records; wherein each of said data items in said data records has a corresponding, said reference data item in a said selected reference record according to a said classification profile to determine a said deviation of a said data item relative to a said reference data item in a said selected reference record; retrieving a translation table from storage, wherein said storage is distinct to said deviation identifiers, and wherein said translation table defines a mapping of said data items in said subset and said determined deviations with said deviation identifiers; processing said deviation identifiers using said selected reference record and said translation table to decode said database of personal data; wherein said processing comprises performing a reverse mapping of said deviation identifiers to deviations of each of said data items in said subset of data items and, using said selected reference record and said deviations, determining said data items in said subset of data items. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A database management system for decoding a subset of data items in data records from a distributed database, said database management system comprising a distributed storage system, wherein said distributed storage system comprises a memory which stores said distributed database,
the distributed database comprising: -
a plurality of deviation identifiers storing anonymised references to deviations in a subset of data items in data records, wherein said deviation identifiers define anonymised deviations of said subset of data items relative to reference data items in a plurality of reference records; the database management system further comprising a processor configured to select one of said plurality of reference records for each one of said data items or subset of data items dependent on a similarity of a said data record to said reference records, wherein said processor is further configured to determine said similarity, wherein said determination of said similarity comprises; categorizing said data items in said data records into a plurality of pools based on classification profiles defined by said reference records, wherein a date item similarity of data items in a said pool is above a threshold;
andcomparing calculated perturbation profiles of one or more of said data items in a said pool with one or more of said reference data items of said reference records, and wherein each of said data items in said data records has a corresponding said reference data item in a said selected reference according to a said classification profile to determine a said deviation of a said data item relative to a said reference data item in a said selected reference record; the database management system further comprising; a data dictionary comprising a translation table storing mappings from said subset of data items and said determined deviations to said deviation identifiers; a transaction engine to retrieve one or more of said plurality of deviation identifiers from remote storage, to retrieve said translation table, and to retrieve said selected reference record, wherein said storage of said deviation identifiers is distinct to said translation table; a query engine to process said deviation identifiers using said selected reference record and said translation table to decode said distributed database and reconstitute said subset of data items in said data records, wherein said decoding comprises performing a reverse mapping of said deviation identifiers to deviations of each of said data items in said subset of data items and, using said selected reference record and said deviations, determining said data items in said subset of data items. - View Dependent Claims (23, 24, 25)
-
Specification