Private information storage system

US 9,202,085 B2
Filed: 11/22/2011
Issued: 12/01/2015
Est. Priority Date: 11/23/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of anonymising a database of personal data, the database comprising a plurality of data records, each data record comprising a plurality of data items, the method comprising;

for a subset of data items in said data records, determining a deviation of each of said data items in said data records relative to reference data items in a plurality of reference records,wherein one of said plurality of reference records is selected for each one of said data items or subset of data items dependent on a similarity of a said data record to said reference records, wherein determining said similarity comprises;

categorizing said data items in said data records into a plurality of pools based on classification profiles defined by said reference records, wherein a data item similarity of data items in a said pool is above a threshold; and

comparing calculated perturbation profiles of one or more of said data items in a said pool with one or more of said reference data items of said reference records,wherein each of said data items in said data records has a corresponding said reference data item in a said selected reference record according to a said classification profile to determine a said deviation of a said data item relative to a said reference data item in a said selected reference record,assigning deviation identifiers to each of said determined deviations in said data records to identify a said data item being recorded as a said determined deviation to a said reference data item and to anonymise said data items in said subset of data items in said data records;

generating a translation table mapping said data items in said subset and said determined deviations to said deviation identifiers;

storing said translation table; and

storing said deviation identifiers defining said anonymised data items for said data records remotely to said translation table.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

This invention relates to a scheme for storage of private information on a cloud computing platform without contravention of territorial privacy laws. A method of anonymising a database of personal data is described whereby data identifiers are assigned to data items and deviation identifiers are assigned to deviations for selected data items derived from reference records. Such information can then be uploaded to a cloud based storage platform. A translation table maps the data items, data identifiers and deviation identifiers to the original data entries. This translation table is stored locally and separate to the anonymised information uploaded to the cloud. The invention further describes a method of decoding the database anonymised according to the above method.

301 Citations

25 Claims

1. A computer-implemented method of anonymising a database of personal data, the database comprising a plurality of data records, each data record comprising a plurality of data items, the method comprising;
- for a subset of data items in said data records, determining a deviation of each of said data items in said data records relative to reference data items in a plurality of reference records,wherein one of said plurality of reference records is selected for each one of said data items or subset of data items dependent on a similarity of a said data record to said reference records, wherein determining said similarity comprises;
  
  categorizing said data items in said data records into a plurality of pools based on classification profiles defined by said reference records, wherein a data item similarity of data items in a said pool is above a threshold; and
  
  comparing calculated perturbation profiles of one or more of said data items in a said pool with one or more of said reference data items of said reference records,wherein each of said data items in said data records has a corresponding said reference data item in a said selected reference record according to a said classification profile to determine a said deviation of a said data item relative to a said reference data item in a said selected reference record,assigning deviation identifiers to each of said determined deviations in said data records to identify a said data item being recorded as a said determined deviation to a said reference data item and to anonymise said data items in said subset of data items in said data records;
  
  generating a translation table mapping said data items in said subset and said determined deviations to said deviation identifiers;
  
  storing said translation table; and
  
  storing said deviation identifiers defining said anonymised data items for said data records remotely to said translation table.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method as claimed in claim 1 further comprising:
    - assigning data identifiers to a second subset of said data items in said data records to anonymise said data items in said second subset of said data items in said data records;
      
      wherein said generating said translation table further comprises mapping said data items in said second subset to said data identifiers; and
      
      further comprising storing said data identifiers remotely to said translation table.
  - 3. The method as claimed in claim 1 further comprising generating said reference record, wherein said generating comprises determining one or more reference data items defining a characteristic profile of data suitable for storing in said database.
  - 4. The method as claimed in claim 3, wherein said characteristic profile is dependent on one or more of said plurality of data records and wherein said characteristic profile is dynamically updated dependent on a further one or more of said plurality of data records.
  - 5. The method as claimed in claim 1, further comprising:
    - assigning reference data identifiers to said reference data items; and
      
      storing said reference data identifiers remotely to said translation table;
      
      wherein said translation table further stores a mapping of said reference data items to said reference data identifiers.
  - 6. The method as claimed in claim 1 wherein one of said data items in said data record comprises a marker, wherein said marker defines said reference record used to determine said deviation.
  - 7. The method as claimed in claim 1 wherein said translation table is stored on a client machine, and said deviation identifiers are stored on a remote data server;
    - andwherein said data identifiers are stored on said remote data server and/or wherein said remote data server is located within a remote cloud computing platform.
  - 8. The method as claimed in claim 1 further comprising encrypting one or more of said translation table and said deviation identifiers;
    - andwherein said storing comprises storing said encrypted one or more of said translation table and said deviation identifiers.
  - 9. The method as claimed in claim 2 further comprising encrypting said data identifiers andwherein said storing comprises storing said encrypted data identifiers.
  - 10. The method as claimed in claim 8 further comprising applying one or more further layers of encryption, in particularwherein one of said one or more further layers of encryption is a proprietary format.
  - 11. The method as claimed in claim 1, wherein said data items in said data records are arranged into fields;
    - andsaid determining a deviation comprises determining a deviation of each of said data items relative to a data item in a corresponding field in said reference record.
  - 12. The method as claimed in claim 2, wherein said second subset of data comprises data defining personal data and said subset of data comprises financial data linked to said personal data, andwherein a said reference record comprises a common financial profile comprising pre-characterized financial data.

13. A computer-implemented method of decoding a database of personal data, wherein said database of personal data is anonymised, the database comprising a plurality of data records, each data record comprising a plurality of data items, the method comprising:
- retrieving deviation identifiers for a subset of said data items in remote data record from remote storage, wherein said deviation identifiers define anonymised deviations for each of said data items in said data records relative to reference data items in a plurality of reference records, wherein one of said plurality of reference records is selected for each one of said data items or subset of data items dependent on a similarity of a said record to said data record to said reference records, wherein determining said similarity comprises;
  
  categorizing said data items in said data records into a plurality of pools based on classification profiles defined by said reference records, wherein a data item similarity of data items in a said pool is above a threshold; and
  
  comparing calculated perturbation profiles of one or more of said data items in a said pool with one or more of said reference data items of said reference records;
  
  wherein each of said data items in said data records has a corresponding, said reference data item in a said selected reference record according to a said classification profile to determine a said deviation of a said data item relative to a said reference data item in a said selected reference record;
  
  retrieving a translation table from storage, wherein said storage is distinct to said deviation identifiers, and wherein said translation table defines a mapping of said data items in said subset and said determined deviations with said deviation identifiers;
  
  processing said deviation identifiers using said selected reference record and said translation table to decode said database of personal data;
  
  wherein said processing comprises performing a reverse mapping of said deviation identifiers to deviations of each of said data items in said subset of data items and, using said selected reference record and said deviations, determining said data items in said subset of data items.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
- - 14. The method as claimed in claim 13 further comprising:
    - retrieving data identifiers for a second subset of said data items in said remote data record from said remote storage, said data identifiers defining anonymised data items in said remote data record;
      
      wherein said translation table further defines a mapping of said data items in said second subset with said data identifiers;
      
      wherein said storage of said translation table is distinct to said data identifiers;
      
      wherein said processing further comprises processing said data identifiers and said translation table to decode said database of personal data; and
      
      wherein said processing further comprises performing a reverse mapping of said data identifiers to said data items in said second subset of data items.
  - 15. The method as claimed in claim 13, wherein said translation table further maps said reference data items to reference data identifiers;
    - the method further comprising;
      
      retrieving said reference data identifiers from remote storage; and
      
      processing said reference data identifiers using said translation table to perform a reverse mapping of said reference data identifiers to said reference data items for said reference record.
  - 16. The method as claimed in claim 13 wherein one of said data items in said data record comprises a marker defining said selected reference record used to determine said anonymised deviation, and wherein said processing further comprises determining said selected reference record from said plurality of reference records by reading said marker.
  - 17. The method as claimed in claim 13 wherein said processing comprises determining said selected reference record from said plurality of reference records based on a difference determined between said deviations in at least one of said data items in said subset and at least one of said reference data items in said plurality of reference records.
  - 18. The method as claimed in claim 13, wherein said translation table is stored on a client machine and said deviation identifiers are stored on a remote data server.
  - 19. The method as claimed in claim 14, wherein said data identifiers are stored on a remote data server;
    - andwherein said remote data server is located within a remote cloud computing platform and/orwherein said translation table is stored remotely to said remote data server and downloaded to said client machine during said retrieving of said translation table from said storage.
  - 20. The method as claimed in claim 13 wherein one or more of said translation table and said deviation identifiers are encrypted, and wherein said processing further comprises decrypting said one or more of said translation table and said deviation identifiers prior to decoding said database of personal data.
  - 21. The method as claimed in claim 14 wherein said data identifiers are encrypted, and wherein said processing further comprises decrypting said data identifiers prior to decoding said database of personal data.

22. A database management system for decoding a subset of data items in data records from a distributed database, said database management system comprising a distributed storage system, wherein said distributed storage system comprises a memory which stores said distributed database,the distributed database comprising:
- a plurality of deviation identifiers storing anonymised references to deviations in a subset of data items in data records, wherein said deviation identifiers define anonymised deviations of said subset of data items relative to reference data items in a plurality of reference records;
  
  the database management system further comprising a processor configured to select one of said plurality of reference records for each one of said data items or subset of data items dependent on a similarity of a said data record to said reference records, wherein said processor is further configured to determine said similarity, wherein said determination of said similarity comprises;
  
  categorizing said data items in said data records into a plurality of pools based on classification profiles defined by said reference records, wherein a date item similarity of data items in a said pool is above a threshold;
  
  andcomparing calculated perturbation profiles of one or more of said data items in a said pool with one or more of said reference data items of said reference records,and wherein each of said data items in said data records has a corresponding said reference data item in a said selected reference according to a said classification profile to determine a said deviation of a said data item relative to a said reference data item in a said selected reference record;
  
  the database management system further comprising;
  
  a data dictionary comprising a translation table storing mappings from said subset of data items and said determined deviations to said deviation identifiers;
  
  a transaction engine to retrieve one or more of said plurality of deviation identifiers from remote storage, to retrieve said translation table, and to retrieve said selected reference record, wherein said storage of said deviation identifiers is distinct to said translation table;
  
  a query engine to process said deviation identifiers using said selected reference record and said translation table to decode said distributed database and reconstitute said subset of data items in said data records,wherein said decoding comprises performing a reverse mapping of said deviation identifiers to deviations of each of said data items in said subset of data items and, using said selected reference record and said deviations, determining said data items in said subset of data items.
- View Dependent Claims (23, 24, 25)
- - 23. The database management system as claimed in claim 22, further comprising a data marshaller to generate said reference records, wherein said generating comprises determining one or more reference data items defining a characteristic profile of data suitable for storing in said database;
    - andwherein said characteristic profile is determined from data items stored in said distributed database.
  - 24. The database management system as claimed in claim 23, wherein said data marshaller is further operable to update said deviation identifiers in said distributed database and said data dictionary responsive to said generating of said reference records.
  - 25. The database management system as claimed in claim 22, wherein one or more of said translation table and said deviation identifiers are encrypted, andwherein said database management system further comprises an authorization engine to decrypt said one or more of said translation table and said deviation identifiers.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Anzen Technology Systems Limited
Original Assignee
Kube Partners Limited
Inventors
Mawdsley, Gary, Meyfroidt, Steven
Primary Examiner(s)
BROWN, SHEREE N

Application Number

US13/302,561
Publication Number

US 20120131075A1
Time in Patent Office

1,470 Days
Field of Search

707/825
US Class Current

1/1
CPC Class Codes

G06F 21/6254 by anonymising data, e.g. d...

H04L 63/0407 wherein the identity of one...

Private information storage system

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

301 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Private information storage system

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

301 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links