Systems and methods for probabilistic data classification
First Claim
1. A computer system comprising:
- a filesystem configured to store a plurality of computer files in a computer memory;
a plurality of scanning agents implemented on one or more computer processors, wherein the plurality of scanning agents are configured to traverse the filesystem and compile attributes and content indexes about the plurality of computer files wherein the attributes and content indexes are stored in one or more databases that are stored separately from the filesystem; and
a file classifier comprising one or more computer processors, wherein the file classifier is configured to receive user input wherein the user selects a first set of attributes and content indexes from the one or more databases stored separately from a corresponding first set of computer files in the filesystem,wherein the file classifier is configured to analyze the user input to determine a set of classification rules such that the classification rules are derived from accessing the first set of the attributes and content indexes in the one or more databases stored separately from the corresponding first set of computer files stored in the filesystem, wherein the set of classification rules are derived without directly accessing the first set of computer files stored in the filesystem,wherein the file classifier is further configured to classify a second set of computer files stored in the filesystem without accessing the filesystem based on a calculated probability derived from a corresponding second set of attributes and context indexes in the one or more databases stored separately from the filesystem.
4 Assignments
0 Petitions
Accused Products
Abstract
A system for performing data classification operations. In one embodiment, the system comprises a filesystem configured to store a plurality of computer files and a scanning agent configured to traverse the filesystem and compile data regarding the attributes and content of the plurality of computer files. The system also comprises an index configured to store the data regarding attributes and content of the plurality of computer files and a file classifier configured to analyze the data regarding the attributes and content of the plurality of computer files and to classify the plurality of computer files into one or more categories based on the data regarding the attributes and content of the plurality of computer files. Results of the file classification operations can be used to set appropriate security permissions on files which include sensitive information or to control the way that a file is backed up or the schedule according to which it is archived.
232 Citations
23 Claims
-
1. A computer system comprising:
-
a filesystem configured to store a plurality of computer files in a computer memory; a plurality of scanning agents implemented on one or more computer processors, wherein the plurality of scanning agents are configured to traverse the filesystem and compile attributes and content indexes about the plurality of computer files wherein the attributes and content indexes are stored in one or more databases that are stored separately from the filesystem; and a file classifier comprising one or more computer processors, wherein the file classifier is configured to receive user input wherein the user selects a first set of attributes and content indexes from the one or more databases stored separately from a corresponding first set of computer files in the filesystem, wherein the file classifier is configured to analyze the user input to determine a set of classification rules such that the classification rules are derived from accessing the first set of the attributes and content indexes in the one or more databases stored separately from the corresponding first set of computer files stored in the filesystem, wherein the set of classification rules are derived without directly accessing the first set of computer files stored in the filesystem, wherein the file classifier is further configured to classify a second set of computer files stored in the filesystem without accessing the filesystem based on a calculated probability derived from a corresponding second set of attributes and context indexes in the one or more databases stored separately from the filesystem. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method comprising:
-
traversing a filesystem and compiling data regarding attributes and content indexes about a plurality of computer files stored in the filesystem, wherein the attributes and content indexes are stored in one or more databases that are stored separately from the filesystem; receiving user input wherein the user selects a first set of attributes and content indexes from the one or more databases stored separately from a corresponding first set of computer files in the filesystem; analyzing the user input and data regarding the first set of attributes and content indexes about the corresponding first set of computer files stored in the filesystem to derive a set of classification rules from the first set of attributes and content indexes stored separately from the corresponding first set of the corresponding first set of computer files without directly accessing the first set of computer files stored in the filesystem; and classifying a second set of computer files stored in the filesystem without accessing the filesystem into one or more categories based on a calculated probability derived from a corresponding second set of attributes and context indexes in the one or more databases stored separately from the filesystem. - View Dependent Claims (17, 18, 19, 20, 21)
-
-
22. A computer system comprising:
-
means for traversing a filesystem and compiling data regarding attributes and content indexes about a plurality of computer files stored in the filesystem in computer memory, wherein the attributes and content indexes are stored in one or more databases that are stored separately from the filesystem, and wherein the means for traversing the filesystem comprises one or more computer processors; means for receiving user input wherein the user selects a first set of attributes and content indexes from the one or more databases stored separately from a corresponding first set of computer files in the filesystem; means for analyzing the user input and data regarding the first set of attributes and content indexes about the corresponding first set of computer files stored in the filesystem to derive a set of classification rules from the first set of attributes and content indexes stored separately from the corresponding first set of computer files without directly accessing the first set of computer files stored in the filesystem; and means for classifying a second set of computer files stored in the filesystem without accessing the filesystem into one or more categories based on a calculated probability derived from a corresponding second set of attributes and context indexes in the one or more databases stored separately from the filesystem. - View Dependent Claims (23)
-
Specification