Data leak prevention enforcement based on learned document classification

US 9,626,528 B2
Filed: 03/07/2014
Issued: 04/18/2017
Est. Priority Date: 03/07/2014
Status: Active Grant

First Claim

Patent Images

1. An automated method for data leak prevention, the method comprising:

obtaining, by a processor, a plurality of training documents and corresponding meta data associated with each training document from a document management system associated with a party, each of the training documents comprising at least one respective content, the corresponding metadata associated with each training document comprising a security classification set by the party in the document management system, the security classification classifying the training document associated with the corresponding metadata into one of at least two security categories;

in response to obtaining the plurality of training documents from the document management system, converting each training document into a feature set comprising at least one pairing of a feature of the respective content of the respective training document with the security classification of the respective training document found in the corresponding metadata associated with the respective training document;

generating, by the processor, a classification model based at least in part upon the pairings found in the feature sets of each of the training documents, wherein the generated classification model comprises at least one correlation between the features found in the respective content of each training document and the security classification found in the corresponding metadata associated with each training document;

obtaining, by the processor, at least one non-training document, wherein the at least one non-training document comprises at least one respective content;

in response to obtaining the at least one non-training document, applying, by the processor, the generated classification model to the at least one non-training document, the application of the classification model to the at least one non-training document comprising;

correlating the at least one respective content of the at least one non-training document to a security classification of the at least one non-training document based on the at least one correlation in the generated classification model; and

classifying the at least one non-training document into one of the at least two security categories based on the correlation of the at least one respective content of the at least one non-training document to the security classification;

monitoring the at least one non-training document, by the processor, for attempted access to the at least one non-training document;

detecting, by the processor, based on the monitoring, an attempted access to the at least one non-training document;

in response to detecting an attempted access to the at least one non-training document, taking, by the processor, a predetermined action;

wherein the predetermined action that is taken is based upon the one of the at least two categories into which the at least one non-training document has been classified by the application of the generated classification model; and

wherein the predetermined action that is taken comprises one of;

(a) denying access to the at least one non-training document to which access is attempted;

(b) logging the attempted access to the at least one non-training document to which access is attempted; and

(c) a combination thereof.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present disclosure relates generally to the field of automatically learning and automatically adapting to perform classification of protected data. In various examples, learning and adapting to perform classification of protected data may be implemented in the form of systems, methods and/or algorithms.

30 Citations

View as Search Results

19 Claims

1. An automated method for data leak prevention, the method comprising:
- obtaining, by a processor, a plurality of training documents and corresponding meta data associated with each training document from a document management system associated with a party, each of the training documents comprising at least one respective content, the corresponding metadata associated with each training document comprising a security classification set by the party in the document management system, the security classification classifying the training document associated with the corresponding metadata into one of at least two security categories;
  
  in response to obtaining the plurality of training documents from the document management system, converting each training document into a feature set comprising at least one pairing of a feature of the respective content of the respective training document with the security classification of the respective training document found in the corresponding metadata associated with the respective training document;
  
  generating, by the processor, a classification model based at least in part upon the pairings found in the feature sets of each of the training documents, wherein the generated classification model comprises at least one correlation between the features found in the respective content of each training document and the security classification found in the corresponding metadata associated with each training document;
  
  obtaining, by the processor, at least one non-training document, wherein the at least one non-training document comprises at least one respective content;
  
  in response to obtaining the at least one non-training document, applying, by the processor, the generated classification model to the at least one non-training document, the application of the classification model to the at least one non-training document comprising;
  
  correlating the at least one respective content of the at least one non-training document to a security classification of the at least one non-training document based on the at least one correlation in the generated classification model; and
  
  classifying the at least one non-training document into one of the at least two security categories based on the correlation of the at least one respective content of the at least one non-training document to the security classification;
  
  monitoring the at least one non-training document, by the processor, for attempted access to the at least one non-training document;
  
  detecting, by the processor, based on the monitoring, an attempted access to the at least one non-training document;
  
  in response to detecting an attempted access to the at least one non-training document, taking, by the processor, a predetermined action;
  
  wherein the predetermined action that is taken is based upon the one of the at least two categories into which the at least one non-training document has been classified by the application of the generated classification model; and
  
  wherein the predetermined action that is taken comprises one of;
  
  (a) denying access to the at least one non-training document to which access is attempted;
  
  (b) logging the attempted access to the at least one non-training document to which access is attempted; and
  
  (c) a combination thereof.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the non-training document is obtained from a document management system.
  - 3. The method of claim 1, further comprising generating, by the processor, an enforcement policy, wherein the enforcement policy specifies the at least one action to be taken when the attempt is made to access a document having a predetermined category.
  - 4. The method of claim 1, wherein the action that is taken comprises permitting access to the non-training document to which access is attempted.
  - 5. The method of claim 4, wherein the action that is taken is permitting and logging access to the non-training document to which access is attempted.
  - 6. The method of claim 3, wherein the enforcement policy is enforced on at least one of:
    - (a) an email component;
      
      (b) an end user device component;
      
      (c) a web component;
      
      (d) a network component; and
      
      (e) a combination thereof.
  - 7. The method of claim 6, wherein:
    - the end user device component comprises at least one of;
      
      (a) a desktop computer;
      
      (b) a laptop computer;
      
      (c) a tablet;
      
      (d) a smartphone; and
      
      (e) a combination thereof.

8. A computer readable storage medium, tangibly embodying a program of instructions executable by the computer for automated data leak prevention, the program of instructions, when executing, performing the following steps:
- obtaining a plurality of training documents and corresponding meta data associated with each training document from a document management system associated with a party, each of the training documents comprising at least one respective content, the corresponding metadata associated with each training document comprising a security classification set by the party in the document management system, the security classification classifying the training document associated with the corresponding metadata into one of at least two security categories;
  
  in response to obtaining the plurality of training documents from the document management system, converting each training document into a feature set comprising at least one pairing of a feature of the respective content of the respective training document with the security classification of the respective training document found in the corresponding metadata associated with the respective training document;
  
  generating a classification model based at least in part upon the pairings found in the feature sets of each of the training documents, wherein the generated classification model comprises at least one correlation between the features found in the respective content of each training document and the security classification found in the corresponding metadata associated with each training document;
  
  obtaining at least one non-training document, wherein the at least one non-training document comprises at least one respective content;
  
  in response to obtaining the at least one non-training document, applying the generated classification model to the at least one non-training document the application of the classification model to the at least one non-training document comprising;
  
  correlating the at least one respective content of the at least one non-training document to a security classification of the at least one non-training document based on the at least one correlation in the generated classification model;
  
  classifying the at least one non-training document into one of the at least two categories based on the correlation of the at least one respective content of the at least one non-training document to the security classification;
  
  monitoring the at least one non-training document for attempted access to the at least one non-training document;
  
  detecting, based on the monitoring, an attempted access to the at least one non-training document;
  
  in response to detecting an attempted access to the at least one non-training document, taking a predetermined action;
  
  wherein the predetermined action that is taken is based upon the one of the at least two categories into which the at least one non-training document has been classified by the application of the generated classification model; and
  
  wherein the predetermined action that is taken comprises one of;
  
  (a) denying access to the at least one non-training document to which access is attempted;
  
  (b) logging the attempted access to the at least one non-training document to which access is attempted; and
  
  (c) a combination thereof.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The computer readable storage medium of claim 8, wherein the program of instructions, when executing, further perform:
    - generating an enforcement policy, wherein the enforcement policy specifies the at least one action to be taken when the attempt is made to access a document having a predetermined category.
  - 10. The computer readable storage medium of claim 8, wherein the action that is taken comprises permitting access to the non-training document to which access is attempted.
  - 11. The computer readable storage medium of claim 8, wherein the action that is taken is permitting and logging access to the non-training document to which access is attempted.
  - 12. The computer readable storage medium of claim 9, wherein the enforcement policy is enforced on at least one of:
    - (a) an email component;
      
      (b) an end user device component;
      
      (c) a web component;
      
      (d) a network component; and
      
      (e) a combination thereof.
  - 13. The computer readable storage medium of claim 12, wherein:
    - the end user device component comprises at least one of;
      
      (a) a desktop computer;
      
      (b) a laptop computer;
      
      (c) a tablet;
      
      (d) a smartphone; and
      
      (e) a combination thereof.

14. A computer-implemented system for automatic data leak prevention, the system comprising:
- a first obtaining element configured to obtain a plurality of training documents and corresponding meta data associated with each training document from a document management system associated with a party, each of the training documents comprising at least one respective content, the corresponding metadata associated with each training document comprising a security classification set by the party in the document management system, the security classification classifying the training document associated with the corresponding metadata into one of at least two security categories;
  
  a converting element configured to, in response to obtaining the plurality of training documents from the document management system, convert each training document into a feature set comprising at least one pairing of a feature of the respective content of the respective training document with the security classification of the respective training document found in the corresponding metadata associated with the respective training document;
  
  a first generating element configured to generate a classification model based at least in part upon the pairings found in the feature sets of each of the training documents, wherein the generated classification model comprises at least one correlation between the features found in the respective content of each training document and the security classification found in the corresponding metadata associated with each training document;
  
  a second obtaining element configured to obtain at least one non-training document, wherein the at least one non-training document comprises at least one respective content;
  
  an applying element configured to apply, in response to the second obtaining element obtaining the at least one non-training document, the generated classification model to the at least one non-training document, the application of the classification model to the at least one non-training document comprising;
  
  correlating the at least one respective content of the at least one non-training document to a security classification of the at least one non-training document based on the at least one correlation in the generated classification model;
  
  classifying the at least one non-training document into one of the at least two categories based on the correlation of the at least one respective content of the at least one non-training document to the security classification;
  
  a monitoring element configured to monitor the at least one non-training document for attempted access to the at least one non-training document and detect based on the monitoring an attempted access to the at least one non-training document;
  
  a taking action element configured to, in response to detecting an attempted access to the at least one non-training document, take a predetermined action;
  
  wherein the predetermined action that is taken is based upon the one of the at least two categories into which the at least one non-training document has been classified by the applying element; and
  
  wherein the predetermined action that is taken comprises one of;
  
  (a) denying access to the at least one non-training document to which access is attempted;
  
  (b) logging the attempted access to the at least one non-training document to which access is attempted; and
  
  (c) a combination thereof.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The system of claim 14, further comprising:
    - a second generating element configured to generate an enforcement policy, wherein the enforcement policy specifies the at least one action to be taken when the attempt is made to access a document having a predetermined category.
  - 16. The system of claim 14, wherein the action that is taken comprises permitting access to the non-training document to which access is attempted.
  - 17. The system of claim 14, wherein the action that is taken is permitting and logging access to the non-training document to which access is attempted.
  - 18. The system of claim 15, wherein the enforcement policy is enforced on at least one of:
    - (a) an email component;
      
      (b) an end user device component;
      
      (c) a web component;
      
      (d) a network component; and
      
      (e) a combination thereof.
  - 19. The system of claim 14, further comprising an output element configured to output the category into which the non-training document is classified.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kyndryl Incorporated
Original Assignee
International Business Machines Corporation
Inventors
Butler, Anthony M.
Primary Examiner(s)
Vincent, David

Application Number

US14/201,107
Publication Number

US 20150254469A1
Time in Patent Office

1,138 Days
Field of Search

706 12, 706 45
US Class Current
CPC Class Codes

G06F 21/6218   to a system of files or obj...

G06N 20/00   Machine learning

G06N 5/025   Extracting rules from data

Data leak prevention enforcement based on learned document classification

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

30 Citations

19 Claims

Specification

Use Cases

Quick Links

Others

Data leak prevention enforcement based on learned document classification

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

30 Citations

19 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others