Protecting confidential information
First Claim
1. A method comprising:
- receiving, by one or more computer processors, from a first computer, text generated by a user;
identifying, by one or more computer processors, in the text generated by the user, one or more confidential information registered in a dictionary, wherein the dictionary contains a plurality of registered confidential information and a plurality of substitute word corresponding to the plurality of registered confidential information;
retrieving, by one or more computer processors, from the dictionary, one or more substitute words corresponding to each identified registered confidential information of the one or more confidential information registered in the dictionary;
identifying, by one or more computer processors, in the text generated by the user, whether one or more words are potentially confidential based, at least in part, on a text analysis of the text generated by the user;
generating, by one or more computer processors, one or more words for each of the one or more potentially confidential words, wherein the generating comprises;
determining, by one or more computer processors, for each of the one or more potentially confidential words, the registered confidential information associated with a shortest edit distance;
retrieving, by one or more computer processors, from the dictionary, the one or more retrieved substitute words to replace the registered confidential information associated with the shortest edit distance;
determining, by one or more computer processors, a category of the one or more retrieved substitute words to replace the registered confidential information associated with the shortest edit distance;
retrieving, by one or more computer processors, a list of unused words in the category of the one or more retrieved substitute words to replace the registered confidential information associated with the shortest edit distance;
selecting, by one or more computer processors, one or more words from the list of unused words in the category of the one or more retrieved substitute words to replace the registered confidential information associated with the shortest edit distance based, at least in part, the text analysis identifying a highest topic index of the selected one or more words from the list of unused words; and
sending, by one or more computer processors, to the first computer, a proposed protected text, wherein the proposed protected text includes the text generated by the user with each of the identified registered confidential information included with each of the one or more retrieved substitute words to replace the identified confidential information and each of the one or more potentially confidential words included with each of the one or more generated words to replace the one or more potentially confidential words.
1 Assignment
0 Petitions
Accused Products
Abstract
An approach using a computer, receives from a first computer, text generated by a user and identifies in the text generated by the user, confidential information registered in a dictionary that contains registered confidential information and substitute words corresponding to the registered confidential information. The approach includes retrieving, from the dictionary, substitute words corresponding to each identified registered confidential information and identifying, in the text generated by the user, potentially confidential words based on a text analysis of the text generated by the user. The approach includes sending to the first computer, a proposed protected text including the text generated by the user with each of the identified registered confidential information marked along with each of the retrieved substitute words to replace the identified confidential information, and each of the potentially confidential words marked along with each of one or more generated words to replace the potentially confidential words.
-
Citations
18 Claims
-
1. A method comprising:
- receiving, by one or more computer processors, from a first computer, text generated by a user;
identifying, by one or more computer processors, in the text generated by the user, one or more confidential information registered in a dictionary, wherein the dictionary contains a plurality of registered confidential information and a plurality of substitute word corresponding to the plurality of registered confidential information; retrieving, by one or more computer processors, from the dictionary, one or more substitute words corresponding to each identified registered confidential information of the one or more confidential information registered in the dictionary; identifying, by one or more computer processors, in the text generated by the user, whether one or more words are potentially confidential based, at least in part, on a text analysis of the text generated by the user; generating, by one or more computer processors, one or more words for each of the one or more potentially confidential words, wherein the generating comprises; determining, by one or more computer processors, for each of the one or more potentially confidential words, the registered confidential information associated with a shortest edit distance; retrieving, by one or more computer processors, from the dictionary, the one or more retrieved substitute words to replace the registered confidential information associated with the shortest edit distance; determining, by one or more computer processors, a category of the one or more retrieved substitute words to replace the registered confidential information associated with the shortest edit distance; retrieving, by one or more computer processors, a list of unused words in the category of the one or more retrieved substitute words to replace the registered confidential information associated with the shortest edit distance; selecting, by one or more computer processors, one or more words from the list of unused words in the category of the one or more retrieved substitute words to replace the registered confidential information associated with the shortest edit distance based, at least in part, the text analysis identifying a highest topic index of the selected one or more words from the list of unused words; and sending, by one or more computer processors, to the first computer, a proposed protected text, wherein the proposed protected text includes the text generated by the user with each of the identified registered confidential information included with each of the one or more retrieved substitute words to replace the identified confidential information and each of the one or more potentially confidential words included with each of the one or more generated words to replace the one or more potentially confidential words. - View Dependent Claims (2, 3, 4, 5, 6, 7)
- receiving, by one or more computer processors, from a first computer, text generated by a user;
-
8. A computer program product comprising:
-
one or more computer readable storage media and program instructions stored on the one or more computer readable storage media, the program instructions executable by a processor, the program instructions comprising instructions to; receive from a first computer, text generated by a user; identify in the text generated by the user, one or more confidential information registered in a dictionary, wherein the dictionary contains a plurality of registered confidential information and a plurality of substitute words corresponding to the plurality of registered confidential information; retrieve from the dictionary, one or more substitute words corresponding to each identified registered confidential information of the one or more confidential information registered in the dictionary; identify in the text generated by the user, whether one or more words are potentially confidential based, at least in part, on a text analysis of the text generated by the user; generate one or more words for each of the one or more potentially confidential words, comprising; determine for each of the one or more potentially confidential words, the registered confidential information associated with a shortest edit distance; retrieve from the dictionary, the one or more retrieved substitute words to replace the registered confidential information associated with the shortest edit distance; determine a category of the one or more retrieved substitute words to replace the registered confidential information associated with the shortest edit distance; retrieve a list of unused words in the category of the one or more retrieved substitute words to replace the registered confidential information associated with the shortest edit distance; select one or more words from the list of unused words in the category of the one or more retrieved substitute words to replace the registered confidential information associated with the shortest edit distance based, at least in part, the text analysis identifying a highest topic index of the selected one or more words from the list of unused words; and send to the first computer, a proposed protected text, wherein the proposed protected text includes the text generated by the user with each of the identified registered confidential information included with each of the one or more retrieved substitute words to replace the identified confidential information and each of the one or more potentially confidential words included with each of the one or more generated words to replace the one or more potentially confidential words. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A computer system comprising:
- one or more computer processors;
one or more computer readable storage media; and
program instructions stored on the one or more computer readable storage media for execution by at least one of the one or more computer processors, the program instructions comprising instructions to;receive from a first computer, text generated by a user; identify in the text generated by the user, one or more confidential information registered in a dictionary, wherein the dictionary contains a plurality of registered confidential information and a plurality of substitute words corresponding to the plurality of registered confidential information; retrieve from the dictionary, one or more substitute words corresponding to each identified registered confidential information of the one or more confidential information registered in the dictionary; identify in the text generated by the user, whether one or more words are potentially confidential based, at least in part, on a text analysis of the text generated by the user; generate one or more words for each of the one or more potentially confidential words, wherein the generating comprises; determine for each of the one or more potentially confidential words, the registered confidential information associated with a shortest edit distance; retrieve from the dictionary, the one or more retrieved substitute words to replace the registered confidential information associated with the shortest edit distance; determine a category of the one or more retrieved substitute words to replace the registered confidential information associated with the shortest edit distance; retrieve a list of unused words in the category of the one or more retrieved substitute words to replace the registered confidential information associated with the shortest edit distance; select one or more words from the list of unused words in the category of the one or more retrieved substitute words to replace the registered confidential information associated with the shortest edit distance based, at least in part, the text analysis identifying a highest topic index of the selected one or more words from the list of unused words; and send to the first computer, a proposed protected text, wherein the proposed protected text includes the text generated by the user with each of the identified registered confidential information included with each of the one or more retrieved substitute words to replace the identified confidential information and each of the one or more potentially confidential words included with each of the one or more generated words to replace the one or more potentially confidential words. - View Dependent Claims (15, 16, 17, 18)
- one or more computer processors;
Specification