Method and apparatus for detecting policy violations in a data repository having an arbitrary data schema
First Claim
Patent Images
1. A computer-implemented method, comprising:
- identifying, by a scan agent hosted by a computer system, a structure of data stored in a data repository having an arbitrary data schema when the arbitrary data schema of the data repository is unknown, wherein the arbitrary data schema of the data repository is derived from the identified structure;
scanning the structured data of the data repository according to the derived data schema;
converting the structured data into text data; and
applying a schema-independent policy to the converted text data to detect confidential information stored in the data repository regardless of the arbitrary data schema of the structured data, wherein the schema-independent policy is a data loss prevention policy, and wherein a violation of the data loss prevention policy is triggered when the converted text data contains the confidential information.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for scanning structured data from a data repository having an arbitrary data schema and for applying a policy to the data of the data repository are described. In one embodiment, the structured data is converted to unstructured text data to allow a schema-independent policy to be applied to the text data in order to detect a policy violation in the data repository regardless of the data schema used by the data repository.
242 Citations
18 Claims
-
1. A computer-implemented method, comprising:
-
identifying, by a scan agent hosted by a computer system, a structure of data stored in a data repository having an arbitrary data schema when the arbitrary data schema of the data repository is unknown, wherein the arbitrary data schema of the data repository is derived from the identified structure; scanning the structured data of the data repository according to the derived data schema; converting the structured data into text data; and applying a schema-independent policy to the converted text data to detect confidential information stored in the data repository regardless of the arbitrary data schema of the structured data, wherein the schema-independent policy is a data loss prevention policy, and wherein a violation of the data loss prevention policy is triggered when the converted text data contains the confidential information. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. An apparatus, comprising:
-
a data repository having an arbitrary data schema; and a scan agent, coupled to the data repository, to; identify a structure of data stored in the data repository when the arbitrary data schema of the data repository is unknown, wherein the arbitrary data schema of the data repository is unknown, wherein the arbitrary data schema of the data repository is derived from the identified structure; scan the structured data of the data repository according to the derived data schema, convert the structured data into unstructured text data, and apply a schema-independent policy to the converted unstructured text data to detect confidential information stored in the data repository regardless of the arbitrary data schema of the structured data, wherein the schema-independent policy is a data loss prevention policy, and wherein a violation of the data loss prevention policy is triggered when the converted text data contains the confidential information. - View Dependent Claims (15, 16)
-
-
17. A non-transitory computer-readable storage medium having instructions stored thereon that when executed by a computer cause the computer to perform a method, comprising:
-
identifying, by a scan agent hosted by a computer system, a structure of data stored in a data repository having an arbitrary data schema when the arbitrary data schema of the data repository is unknown, wherein the arbitrary data schema of the data repository is derived from the identified structure; scanning the structured data of the data repository according to the derived data schema; converting the structured data into text data; and applying a schema-independent policy to the converted text data to detect confidential information stored in the data repository regardless of the arbitrary data schema of the structured data, wherein the schema-independent policy is a data loss prevention policy, and wherein a violation of the data loss prevention policy is triggered when the converted text data contains the confidential information. - View Dependent Claims (18)
-
Specification