Storing tokenized information in untrusted environments
First Claim
1. A computer-implemented method, comprising:
- in a trusted computing environment, parsing a file to determine a plurality of words included in the file, based on whitespace characters that separate the words in the file, the file comprising one or more sensitive words corresponding to financial account data;
for individual words that are unique in the plurality of words, determining a corresponding token that corresponds to the word, such that the word is not derivable from the token;
generating a tokenized file that includes corresponding tokens in place of the plurality of words;
storing the tokenized file in an untrusted computing environment;
in the trusted computing environment, storing a mapping of the plurality of words to the corresponding tokens; and
in the untrusted computing environment;
storing a whitelist mapping of a subset of the plurality of words to the corresponding tokens, the subset including non-sensitive words other than the one or more sensitive words;
receiving a search request including one or more search terms;
for the one or more search terms that are included in the whitelist, retrieving the corresponding token;
for the one or more search terms that are not included in the whitelist, sending a request that the trusted computing environment retrieve the corresponding token;
based at least in part on one or more tokens corresponding to the one or more search terms, perform a search of the tokenized file stored in the untrusted computing environment;
identifying one or more tokens in the tokenized file that are included in the whitelist;
replacing the identified one or more tokens with one or more corresponding words from the whitelist, to generate partly detokenized information; and
providing the partly detokenized information in response to the search request.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques are described for tokenizing information to be stored in an untrusted environment. During tokenization, one or more strings in a file or data stream are replaced with a token. The token may be generated as a random number or a counter, such that the replaced string may not be derived based on the token. Token-to-string mapping data may be stored in a trusted environment, and the tokenized information may be stored in the untrusted environment. Users may search the tokenized information based on non-sensitive search terms present in a whitelist that is accessible from the untrusted environment, the whitelist providing a token-to-string mapping for the non-sensitive terms. The search results may be provided as redacted information, in which the non-sensitive strings have been detokenized based on the whitelist while the sensitive strings remain tokenized.
-
Citations
20 Claims
-
1. A computer-implemented method, comprising:
-
in a trusted computing environment, parsing a file to determine a plurality of words included in the file, based on whitespace characters that separate the words in the file, the file comprising one or more sensitive words corresponding to financial account data; for individual words that are unique in the plurality of words, determining a corresponding token that corresponds to the word, such that the word is not derivable from the token; generating a tokenized file that includes corresponding tokens in place of the plurality of words; storing the tokenized file in an untrusted computing environment; in the trusted computing environment, storing a mapping of the plurality of words to the corresponding tokens; and in the untrusted computing environment; storing a whitelist mapping of a subset of the plurality of words to the corresponding tokens, the subset including non-sensitive words other than the one or more sensitive words; receiving a search request including one or more search terms; for the one or more search terms that are included in the whitelist, retrieving the corresponding token; for the one or more search terms that are not included in the whitelist, sending a request that the trusted computing environment retrieve the corresponding token; based at least in part on one or more tokens corresponding to the one or more search terms, perform a search of the tokenized file stored in the untrusted computing environment; identifying one or more tokens in the tokenized file that are included in the whitelist; replacing the identified one or more tokens with one or more corresponding words from the whitelist, to generate partly detokenized information; and providing the partly detokenized information in response to the search request. - View Dependent Claims (2, 3, 4)
-
-
5. A system, comprising:
-
a token mapping datastore storing token mapping data that associates a plurality of strings with corresponding tokens, the token mapping datastore included in a first computing environment associated with a first trust level; a whitelist token mapping datastore storing whitelist token mapping data that associates a subset of the plurality of strings with the corresponding tokens, the whitelist token mapping datastore included in a second computing environment associated with a second trust level; a first computing device in communication with the token mapping datastore, the first computing device configured to execute a first set of computer-readable instructions that cause the first computing device to; generate tokenized information that includes one or more tokens that correspond to one or more strings of the plurality of strings; send the tokenized information to be stored in the second computing environment; and a second computing device in communication with the first computing device and the whitelist token mapping datastore, the second computing device configured to execute a second set of computer-readable instructions that cause the second computing device to; receive a search request including one or more search terms; for the one or more search terms that are included in the whitelist token mapping data, retrieve the corresponding token from the whitelist token mapping datastore; for the one or more search terms that are not included in the whitelist token mapping data, send a request that the first computing device retrieve the corresponding token from the token mapping datastore; based at least in part on one or more tokens corresponding to the one or more search terms, perform a search for the tokenized information stored in the second computing environment; identify one or more tokens in the tokenized information that are included in the whitelist token mapping data; replace the identified one or more tokens with one or more corresponding strings from the whitelist token mapping data, to generate partly detokenized information; and provide the partly detokenized information in response to the search request. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A system, comprising:
-
a token mapping datastore storing token mapping data that associates a plurality of strings with corresponding tokens, wherein the token mapping datastore is included in a first computing environment associated with a first trust level; a whitelist token mapping datastore storing whitelist token mapping data that associates a subset of the plurality of strings with the corresponding tokens, wherein the whitelist token mapping datastore is included in a second computing environment associated with a second trust level; a first computing device in communication with the token mapping datastore, the first computing device configured to execute computer-readable instructions that cause the first computing device to; generate tokenized information that includes one or more tokens that correspond to one or more strings of the plurality of strings; and send the tokenized information to be stored in the second computing environment; and a second computing device in communication with the first computing device and the whitelist token mapping datastore, the second computing device configured to execute computer-readable instructions that cause the second computing device to; receive a search request to search for a file that includes one or more search terms; for the one or more search terms that are included in the whitelist token mapping data, retrieve the corresponding token from the whitelist token mapping datastore; for the one or more search terms that are not included in the whitelist token mapping data, send a request that the first computing device retrieve the corresponding token from the token mapping datastore; search for a tokenized version of the file that includes the one or more tokens corresponding to the one or more search terms; identify, in the tokenized version of the file, one or more tokens that are included in the whitelist token mapping data; replace the identified one or more tokens with the corresponding string from the whitelist token mapping data, to generate an at least partly detokenized version of the file; and provide the at least partly detokenized version of the file in response to the search request. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification