×

Strategies for sanitizing data items

  • US 7,509,684 B2
  • Filed: 10/09/2004
  • Issued: 03/24/2009
  • Est. Priority Date: 10/09/2004
  • Status: Active Grant
First Claim
Patent Images

1. A method for sanitizing restricted data items in a data set to prevent the revelation of the restricted data items, comprising:

  • transferring an original data set from a production environment to a sanitizer, the original data set being characterized by a state and including a plurality of data items stored in a plurality of different locations corresponding to separate data stores;

    generating a data directory table, which is separate from the original data set, by;

    identifying a plurality of different data items within the original data set;

    classifying the data items as having a restricted or non-restricted status; and

    mapping the restricted data items to their respective locations within the original data set;

    sanitizing at least a portion of the original data set using the sanitizer, while preserving the state of the original data set, the sanitizing comprising;

    identifying the locations of the restricted data items in the original data set,wherein identifying the locations of the restricted data items comprises using the data directory table to identify the locations of the restricted data items in the original data set;

    identifying at least one sanitizing tool, from a plurality of sanitizing tools, to apply to the restricted data items which have been located in the original data set, wherein identifying the at least one sanitizing tool utilizes a stored reference in the data directory table which links the restricted data items to the at least one sanitizing tool, wherein each of the plurality of sanitizing tools;

    modify the restricted data items by transforming the restricted data items so that at least one statistical feature of the restricted data items is preserved;

    apply different randomizing algorithms transforming different types of restricted data items, wherein the different algorithms assign characters to text strings as a substitution to the restricted data items; and

    produce sanitized data items that remain functional such that one or more applications in a testing environment can interact with the sanitized data items such that analysis and testing can be realized;

    applying said at least one sanitizing tool to the restricted data items which have been located in the original data set to provide a sanitized data set, whereinapplying said at least one sanitizing tool to the restricted data items comprises a bulk sanitization operation;

    forwarding the sanitized data set to a target environment; and

    in an event that the original data set has changed subsequent to the bulk sanitization operation, performing a delta sanitization operation to sanitize the restricted data items in the original data set which have changed subsequent to the bulk sanitization operation.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×