Storing tokenized information in untrusted environments

US 9,081,978 B1
Filed: 05/30/2013
Issued: 07/14/2015
Est. Priority Date: 05/30/2013
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

in a trusted computing environment, parsing a file to determine a plurality of words included in the file, based on whitespace characters that separate the words in the file, the file comprising one or more sensitive words corresponding to financial account data;

for individual words that are unique in the plurality of words, determining a corresponding token that corresponds to the word, such that the word is not derivable from the token;

generating a tokenized file that includes corresponding tokens in place of the plurality of words;

storing the tokenized file in an untrusted computing environment;

in the trusted computing environment, storing a mapping of the plurality of words to the corresponding tokens; and

in the untrusted computing environment;

storing a whitelist mapping of a subset of the plurality of words to the corresponding tokens, the subset including non-sensitive words other than the one or more sensitive words;

receiving a search request including one or more search terms;

for the one or more search terms that are included in the whitelist, retrieving the corresponding token;

for the one or more search terms that are not included in the whitelist, sending a request that the trusted computing environment retrieve the corresponding token;

based at least in part on one or more tokens corresponding to the one or more search terms, perform a search of the tokenized file stored in the untrusted computing environment;

identifying one or more tokens in the tokenized file that are included in the whitelist;

replacing the identified one or more tokens with one or more corresponding words from the whitelist, to generate partly detokenized information; and

providing the partly detokenized information in response to the search request.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques are described for tokenizing information to be stored in an untrusted environment. During tokenization, one or more strings in a file or data stream are replaced with a token. The token may be generated as a random number or a counter, such that the replaced string may not be derived based on the token. Token-to-string mapping data may be stored in a trusted environment, and the tokenized information may be stored in the untrusted environment. Users may search the tokenized information based on non-sensitive search terms present in a whitelist that is accessible from the untrusted environment, the whitelist providing a token-to-string mapping for the non-sensitive terms. The search results may be provided as redacted information, in which the non-sensitive strings have been detokenized based on the whitelist while the sensitive strings remain tokenized.

Citations

20 Claims

1. A computer-implemented method, comprising:
- in a trusted computing environment, parsing a file to determine a plurality of words included in the file, based on whitespace characters that separate the words in the file, the file comprising one or more sensitive words corresponding to financial account data;
  
  for individual words that are unique in the plurality of words, determining a corresponding token that corresponds to the word, such that the word is not derivable from the token;
  
  generating a tokenized file that includes corresponding tokens in place of the plurality of words;
  
  storing the tokenized file in an untrusted computing environment;
  
  in the trusted computing environment, storing a mapping of the plurality of words to the corresponding tokens; and
  
  in the untrusted computing environment;
  
  storing a whitelist mapping of a subset of the plurality of words to the corresponding tokens, the subset including non-sensitive words other than the one or more sensitive words;
  
  receiving a search request including one or more search terms;
  
  for the one or more search terms that are included in the whitelist, retrieving the corresponding token;
  
  for the one or more search terms that are not included in the whitelist, sending a request that the trusted computing environment retrieve the corresponding token;
  
  based at least in part on one or more tokens corresponding to the one or more search terms, perform a search of the tokenized file stored in the untrusted computing environment;
  
  identifying one or more tokens in the tokenized file that are included in the whitelist;
  
  replacing the identified one or more tokens with one or more corresponding words from the whitelist, to generate partly detokenized information; and
  
  providing the partly detokenized information in response to the search request.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, further comprising:
    - receiving identification of the a user associated with the search request; and
      
      determining, based on access control data, whether the user is permitted access to the one or more search terms included in the search request.
  - 3. The method of claim 1, wherein the determining of the token is based on a random or pseudo-random number generation algorithm.
  - 4. The method of claim 1, wherein the determining of the token includes generating the token as an ordinal value that is incremented or decremented for each unique word in the plurality of words.

5. A system, comprising:
- a token mapping datastore storing token mapping data that associates a plurality of strings with corresponding tokens, the token mapping datastore included in a first computing environment associated with a first trust level;
  
  a whitelist token mapping datastore storing whitelist token mapping data that associates a subset of the plurality of strings with the corresponding tokens, the whitelist token mapping datastore included in a second computing environment associated with a second trust level;
  
  a first computing device in communication with the token mapping datastore, the first computing device configured to execute a first set of computer-readable instructions that cause the first computing device to;
  
  generate tokenized information that includes one or more tokens that correspond to one or more strings of the plurality of strings;
  
  send the tokenized information to be stored in the second computing environment; and
  
  a second computing device in communication with the first computing device and the whitelist token mapping datastore, the second computing device configured to execute a second set of computer-readable instructions that cause the second computing device to;
  
  receive a search request including one or more search terms;
  
  for the one or more search terms that are included in the whitelist token mapping data, retrieve the corresponding token from the whitelist token mapping datastore;
  
  for the one or more search terms that are not included in the whitelist token mapping data, send a request that the first computing device retrieve the corresponding token from the token mapping datastore;
  
  based at least in part on one or more tokens corresponding to the one or more search terms, perform a search for the tokenized information stored in the second computing environment;
  
  identify one or more tokens in the tokenized information that are included in the whitelist token mapping data;
  
  replace the identified one or more tokens with one or more corresponding strings from the whitelist token mapping data, to generate partly detokenized information; and
  
  provide the partly detokenized information in response to the search request.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 6. The system of claim 5, the first set of computer-readable instructions further comprising instructions that cause the first computing device to:
    - parse information in the first computing environment to determine the one or more of the plurality of strings based on separation by one or more whitespace characters; and
      
      for individual strings that are unique in the one or more strings, determine a token that corresponds to the string, based at least partly on the token mapping data in the token mapping datastore.
  - 7. The system of claim 5, wherein:
    - the token mapping datastore includes the token mapping data for one or more of the plurality of strings that include sensitive data; and
      
      the whitelist token mapping datastore includes the whitelist token mapping data for one or more of the subset of the plurality of strings that do not include the sensitive data.
  - 8. The system of claim 5, wherein:
    - the search request further comprises a search range in dates, times, or both dates and times;
      
      the tokenized information includes timestamp data describing when the tokenized information was created; and
      
      the search includes a search for a version of the tokenized information having a timestamp within the search range.
  - 9. The system of claim 5, the second computing device further configured to:
    - for the one or more search terms that are not included in the whitelist token mapping data, modify the whitelist token mapping data to include an association of the search term with the corresponding token retrieved from the token mapping datastore.
  - 10. The system of claim 5, the first computing device further configured to:
    - receive, from the second computing device, the request that the first computing device retrieve the corresponding token from the token mapping datastore;
      
      analyze the one or more search terms included in the request, to determine a probability that the one or more search terms include sensitive information; and
      
      deny the request, based on the probability exceeding a predetermined threshold probability.
  - 11. The system of claim 5, the first computing device further configured to:
    - receive, from the second computing device, the request that the first computing device retrieve the corresponding token from the token mapping datastore;
      
      determine a frequency of a plurality of requests that include the request; and
      
      deny the request, based on the frequency exceeding a predetermined threshold frequency.
  - 12. The system of claim 5, the first computing device further configured to:
    - receive, from the second computing device, the request that the first computing device retrieve the corresponding token from the token mapping datastore, the request including an identification of a user associated with the search request, the request including the one or more search terms that are not included in the whitelist token mapping data;
      
      determine whether the user is permitted access to the one or more search terms included in the request, based on access control data; and
      
      deny the request, based on a determination that the user is not permitted access to the one or more search terms.
  - 13. The system of claim 5, the first computing device further configured to:
    - based on a determination that the token mapping data does not include the string, generate a new token that is non-derivable based on the string, the new token being generated based on;
      
      a random number generation algorithm;
      
      a pseudo-random number generation algorithm;
      
      oran ordinal value that is incremented or decremented for each string included the token mapping data; and
      
      modify the token mapping data in the token mapping datastore to include an association of the string with the new token.
  - 14. The system of claim 5, further comprising:
    - the second computing device in communication with the first computing device and the whitelist token mapping datastore, the second computing device configured to execute a second set of computer-readable instructions that cause the second computing device to;
      
      send, to the first computing device, a request for one or more tokens corresponding to one or more strings, the one or more strings including one or more of;
      
      one or more hostnames of one or more computing devices in the first computing environment;
      
      one or more usernames of one or more users;
      
      orone or more common portions of message strings generated by one or more processes that execute in the first computing environment;
      
      receive, from the first computing device, the one or more tokens corresponding to the one or more strings; and
      
      populate the whitelist token mapping data with the one or more strings mapped to the one or more corresponding tokens.

15. A system, comprising:
- a token mapping datastore storing token mapping data that associates a plurality of strings with corresponding tokens, wherein the token mapping datastore is included in a first computing environment associated with a first trust level;
  
  a whitelist token mapping datastore storing whitelist token mapping data that associates a subset of the plurality of strings with the corresponding tokens, wherein the whitelist token mapping datastore is included in a second computing environment associated with a second trust level;
  
  a first computing device in communication with the token mapping datastore, the first computing device configured to execute computer-readable instructions that cause the first computing device to;
  
  generate tokenized information that includes one or more tokens that correspond to one or more strings of the plurality of strings; and
  
  send the tokenized information to be stored in the second computing environment; and
  
  a second computing device in communication with the first computing device and the whitelist token mapping datastore, the second computing device configured to execute computer-readable instructions that cause the second computing device to;
  
  receive a search request to search for a file that includes one or more search terms;
  
  for the one or more search terms that are included in the whitelist token mapping data, retrieve the corresponding token from the whitelist token mapping datastore;
  
  for the one or more search terms that are not included in the whitelist token mapping data, send a request that the first computing device retrieve the corresponding token from the token mapping datastore;
  
  search for a tokenized version of the file that includes the one or more tokens corresponding to the one or more search terms;
  
  identify, in the tokenized version of the file, one or more tokens that are included in the whitelist token mapping data;
  
  replace the identified one or more tokens with the corresponding string from the whitelist token mapping data, to generate an at least partly detokenized version of the file; and
  
  provide the at least partly detokenized version of the file in response to the search request.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The system of claim 15, wherein:
    - the token mapping datastore includes the token mapping data for one or more of the plurality of strings that include sensitive data; and
      
      the whitelist token mapping datastore includes the whitelist token mapping data for one or more of the plurality of strings that do not include the sensitive data.
  - 17. The system of claim 15, wherein:
    - the search request further comprises a search range in dates, times, or both dates and times;
      
      the tokenized file includes timestamp data describing when the tokenized file was created; and
      
      the searching for the tokenized version of the file includes searching for the tokenized version of the file having a timestamp within the search range.
  - 18. The system of claim 15, the first computing device further configured to execute computer-readable instructions that cause the first computing device to:
    - generate a token that is non-derivable from the one or more strings of the plurality of strings based on one or more of;
      
      a random number generation algorithm;
      
      a pseudo-random number generation algorithm;
      
      oran ordinal value that is incremented or decremented for each string included the token mapping data.
  - 19. The system of claim 15, the first computing device further configured to:
    - receive, from the second computing device, the request to retrieve the corresponding token from the token mapping datastore, the request including an identification of a user associated with the search request, the request including the one or more search terms that are not included in the whitelist token mapping data;
      
      determine whether the user is permitted access to the one or more search terms included in the request, based on access control data; and
      
      deny the request, based on a determination that the user is not permitted access to the one or more search terms.
  - 20. The system of claim 15, wherein the tokens included in the token mapping data and in the whitelist token mapping data have a same size.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Connolly, Jeremiah John, Marinus, Dennis
Primary Examiner(s)
Armouche, Hadi
Assistant Examiner(s)
Zarrineh, Shahriar

Application Number

US13/905,815
Time in Patent Office

775 Days
Field of Search

726/30
US Class Current

1/1
CPC Class Codes

G06F 21/10   Protecting distributed prog...

G06F 21/62   Protecting access to data v...

G06F 21/6227   where protection concerns t...

G06F 21/6254   by anonymising data, e.g. d...

H04L 63/0428   wherein the data content is...

Storing tokenized information in untrusted environments

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Storing tokenized information in untrusted environments

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links