Removing personal information from text using a neural network
First Claim
1. A computer-implemented method for removing personal information from text using a neural network, the method comprising:
- obtaining the neural network, wherein the neural network is configured to process the text and select a label from a plurality of possible labels for each word of the text, wherein each label corresponds to a class of words, and wherein at least one label corresponds to a class of words to be removed from the text;
receiving the text;
obtaining a word embedding for each word of the text, where a word embedding represents a word in a vector space;
computing a context vector for each word of the text by processing the word embeddings with a first layer of the neural network, where a context vector for a given word includes information about words before or after the given word;
computing label scores for each word of the text by processing each of the context vectors with a second layer of the neural network, wherein each label score indicates a match between a word and a class of words;
selecting a label for each word of the text by processing the label scores with a third layer of the neural network; and
generating redacted text by replacing a first word of the text with a first label corresponding to the first word.
1 Assignment
0 Petitions
Accused Products
Abstract
A neural network may be used to remove personal information from text (such as names, addresses, credit card numbers, or social security numbers), and replace the personal information with a label indicating the type or class of the removed information. The neural network may comprise multiple layers that compute a context vector for words of the text, compute label scores for words of the text using the context vectors, and select a label for each word using the label scores. Words corresponding to certain labels may be replaced with a label, such as replacing the digits of a credit card number with a label <cc_number>. The redacted text may then be presented to a person or stored for later processing.
56 Citations
20 Claims
-
1. A computer-implemented method for removing personal information from text using a neural network, the method comprising:
-
obtaining the neural network, wherein the neural network is configured to process the text and select a label from a plurality of possible labels for each word of the text, wherein each label corresponds to a class of words, and wherein at least one label corresponds to a class of words to be removed from the text; receiving the text; obtaining a word embedding for each word of the text, where a word embedding represents a word in a vector space; computing a context vector for each word of the text by processing the word embeddings with a first layer of the neural network, where a context vector for a given word includes information about words before or after the given word; computing label scores for each word of the text by processing each of the context vectors with a second layer of the neural network, wherein each label score indicates a match between a word and a class of words; selecting a label for each word of the text by processing the label scores with a third layer of the neural network; and generating redacted text by replacing a first word of the text with a first label corresponding to the first word. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for removing personal information from text using a neural network, the system comprising at least one computer configured to:
-
obtain the neural network, wherein the neural network is configured to process the text and select a label from a plurality of possible labels for each word of the text, wherein each label corresponds to a class of words, and wherein at least one label corresponds to a class of words to be removed from the text; receive the text; obtain a word embedding for each word of the text; compute a context vector for each word of the text by processing the word embeddings with a first layer of the neural network; compute label scores for each word of the text by processing each of the context vectors with a second layer of the neural network, wherein each label score indicates a match between a word and a class of words; select a label for each word of the text by processing the label scores with a third layer of the neural network; and generate redacted text by replacing a first word of the text with a first label corresponding to the first word. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. One or more non-transitory computer-readable media comprising computer executable instructions that, when executed, cause at least one processor to perform actions comprising:
-
obtaining a neural network, wherein the neural network is configured to process text and select a label from a plurality of possible labels for each word of the text, wherein each label corresponds to a class of words, and wherein at least one label corresponds to a class of words to be removed from the text; receiving the text; obtaining a word embedding for each word of the text; computing a context vector for each word of the text by processing the word embeddings with a first layer of the neural network; computing label scores for each word of the text by processing each of the context vectors with a second layer of the neural network, wherein each label score indicates a match between a word and a class of words; selecting a label for each word of the text by processing the label scores with a third layer of the neural network; and generating redacted text by replacing a first word of the text with a first label corresponding to the first word. - View Dependent Claims (17, 18, 19, 20)
-
Specification