Data processing systems for automated classification of personal information from documents and related methods
First Claim
1. A computer-implemented data processing method for automatically classifying personal information in an electronic document and generating a sensitivity score for the electronic document based on the classification, the method comprising:
- receiving, by one or more processors, the electronic document for analysis;
using one or more natural language processing techniques, by the one or more processors, to decompose data from the electronic document into;
one or more structured objects; and
one or more values for each of the one or more structured objects;
classifying, by the one or more processors, the each of the one or more structured objects in the electronic document based on one or more attributes of the one or more structured objects;
categorizing, by the one or more processors, the each of the one or more structured objects based on a sensitivity of the one or more structured objects;
rating, by the one or more processors, an accuracy of the categorization, wherein rating the accuracy of the categorization comprises;
receiving a second electronic document that is related to the electronic document;
using the one or more natural language processing techniques, by one or more processors, to decompose data from the second electronic document into;
one or more second structured objects; and
one or more second values for each of the one or more structured objects;
classifying, by one or more processors, each of the one or more second structured objects in the second electronic document based on one or more second attributes of the one or more second structured objects;
categorizing, by one or more processors, each of the one or more second structured objects based on a sensitivity of the one or more second structured objects; and
comparing the categorization of the one or more structured objects with the categorization of the one or more second structured objects; and
rating the accuracy based on the comparison; and
generating, by the one or more processors, a sensitivity score for the electronic document based at least in part on the categorized one or more structured objects and the associated one or more values.
2 Assignments
0 Petitions
Accused Products
Abstract
An automated classification system may be configured to substantially automatically classify one or more pieces of personal information in one or more documents (e.g., one or more text-based documents, one or more spreadsheets, one or more PDFs, one or more webpages, etc.). The system may be implemented in the context of any suitable privacy compliance system, which may, for example, be configured to calculate and assign a sensitivity score to a particular document based at least in part on one or more determined categories of personal information identified in the one or more documents. The storage of particular types of personal information may be governed by one or more government or industry regulations, which may require particular security measures, storage techniques, handling, etc. for documents based on one or more categories of information contained therein.
-
Citations
16 Claims
-
1. A computer-implemented data processing method for automatically classifying personal information in an electronic document and generating a sensitivity score for the electronic document based on the classification, the method comprising:
-
receiving, by one or more processors, the electronic document for analysis; using one or more natural language processing techniques, by the one or more processors, to decompose data from the electronic document into; one or more structured objects; and one or more values for each of the one or more structured objects; classifying, by the one or more processors, the each of the one or more structured objects in the electronic document based on one or more attributes of the one or more structured objects; categorizing, by the one or more processors, the each of the one or more structured objects based on a sensitivity of the one or more structured objects; rating, by the one or more processors, an accuracy of the categorization, wherein rating the accuracy of the categorization comprises; receiving a second electronic document that is related to the electronic document; using the one or more natural language processing techniques, by one or more processors, to decompose data from the second electronic document into; one or more second structured objects; and one or more second values for each of the one or more structured objects; classifying, by one or more processors, each of the one or more second structured objects in the second electronic document based on one or more second attributes of the one or more second structured objects; categorizing, by one or more processors, each of the one or more second structured objects based on a sensitivity of the one or more second structured objects; and comparing the categorization of the one or more structured objects with the categorization of the one or more second structured objects; and rating the accuracy based on the comparison; and generating, by the one or more processors, a sensitivity score for the electronic document based at least in part on the categorized one or more structured objects and the associated one or more values. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-implemented data processing method for automatically classifying personal information in an electronic document and generating a sensitivity score for the electronic document based on the classification, the method comprising:
-
receiving, by one or more processors, the electronic document for analysis; sorting, using one or more natural language processing techniques, data from the electronic document into; one or more structured objects; and one or more values for each of the one or more structured objects; classifying, by the one or more processors, the each of the one or more structured objects in the electronic document based on one or more attributes of the one or more structured objects; categorizing, by the one or more processors, the each of the one or more structured objects based on a sensitivity of the one or more structured objects; generating, by the one or more processors, a sensitivity score for the electronic document based at least in part on the categorized one or more structured objects and the associated one or more values; parsing the classification of one or more structured objects; identifying the each of the one or more structured objects having an empty associated value; modifying the classification of one or more structured objects to remove the identified one or more structured objects from the classification; rating, by the one or more processors, an accuracy of the categorization by receiving a second electronic document that is related to the electronic document; sorting, using the one or more natural language processing techniques, the second electronic document into; one or more second structured objects; and one or more second values for each of the one or more structured objects; classifying, by the one or more processors, each of the one or more second structured objects in the second electronic document based on one or more second attributes of the one or more second structured objects; categorizing, by the one or more processors, each of the one or more second structured objects based on a sensitivity of the one or more second structured objects; and generating, by the one or more processors, a second sensitivity score for the second electronic document based at least in part on the categorized one or more second structured objects and the associated one or more second values; parsing the classification of one or more second structured objects; identifying each of the one or more second structured objects having an empty associated value; modifying the classification of one or more second structured objects to remove the identified one or more second structured objects from the classification; comparing the categorization of the one or more structured objects with the categorization of the one or more second structured objects; and rating the accuracy based on the comparison. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A computer-implemented data processing method for automatically classifying personal information in an electronic document and generating a sensitivity score for the electronic document based on the classification, the method comprising:
-
receiving, by one or more processors, the electronic document for analysis; using one or more natural language processing techniques, by the one or more processors, to decompose data from the electronic document into; one or more structured objects; and one or more values for each of the one or more structured objects; classifying, by the one or more processors, the each of the one or more structured objects in the electronic document based on one or more attributes of the one or more structured objects; categorizing, by the one or more processors, the each of the one or more structured objects based on a sensitivity of the one or more structured objects; generating, by the one or more processors, a sensitivity score for the electronic document based at least in part on the categorized one or more structured objects and the associated one or more values; rating an accuracy of the categorization by receiving a second electronic document that is related to the electronic document; using the one or more natural language processing techniques, by the one or more processors, to decompose data from the second electronic document into; one or more second structured objects; and one or more second values for each of the one or more structured objects; classifying, by the one or more processors, each of the one or more second structured objects in the second electronic document based on one or more second attributes of the one or more second structured objects; categorizing, by the one or more processors, the each of the one or more second structured objects based on a sensitivity of the one or more second structured objects; and comparing the categorization of the one or more structured objects with the categorization of the one or more second structured objects; and rating the accuracy based on the comparison. - View Dependent Claims (13, 14, 15, 16)
-
Specification