Method and system for tracing information leaks in organizations through syntactic and linguistic signatures
First Claim
Patent Images
1. A computer-executable method for tracing information leaks, comprising:
- obtaining, by a computing device, a disseminated document to analyze;
determining, from a collection of original documents, an original document that is most similar to the disseminated document;
comparing the disseminated document to the most similar original document to determine differences between the disseminated document and the most similar original document;
querying a database containing changes to documents, using the determined differences, to determine a most similar changed document;
determining a distance value by comparing changes from the most similar changed document with the determined differences from the disseminated document; and
responsive to determining that the distance value is less than a threshold value, determining a user identifier for a user associated with the most similar changed document.
1 Assignment
0 Petitions
Accused Products
Abstract
One embodiment of the present invention provides a system for tracing information leaks. The system introduces linguistic and syntactic changes to a document, and associates these changes with a user identifier, which facilitates identification of a user that may have leaked the document. During operation, the system receives a document. The system then determines a most similar original document based on the received document. The system determines difference between the most similar original document and the received document, and determines a user identifier based on the determined difference.
-
Citations
16 Claims
-
1. A computer-executable method for tracing information leaks, comprising:
-
obtaining, by a computing device, a disseminated document to analyze; determining, from a collection of original documents, an original document that is most similar to the disseminated document; comparing the disseminated document to the most similar original document to determine differences between the disseminated document and the most similar original document; querying a database containing changes to documents, using the determined differences, to determine a most similar changed document; determining a distance value by comparing changes from the most similar changed document with the determined differences from the disseminated document; and responsive to determining that the distance value is less than a threshold value, determining a user identifier for a user associated with the most similar changed document. - View Dependent Claims (2, 3, 4, 5, 16)
-
-
6. A computing system for tracing information leaks, the system comprising:
-
one or more processors, a computer-readable medium coupled to the one or more processors having instructions stored thereon that, when executed by the one or more processors, cause the one or more processors to perform operations comprising; obtaining a disseminated document to analyze; determining, from a collection of original documents, an original document that is most similar to the disseminated document; comparing the disseminated document to the most similar original document to determine differences between the disseminated document and the most similar original document; querying a database containing changes to documents, using the determined differences, to determine a most similar changed document; determining a distance value by comparing changes from the most similar changed document with the determined differences from the disseminated document; and responsive to determining that the distance value is less than a threshold value, determining a user identifier for a user associated with the most similar changed document. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for tracing information leaks, the method comprising:
-
obtaining a disseminated document to analyze; determining, from a collection of original documents, an original document that is most similar to the disseminated document; comparing the disseminated document to the most similar original document to determine differences between the disseminated document and the most similar original document; querying a database containing changes to documents, using the determined differences, to determine a most similar changed document; determining a distance value by comparing changes from the most similar changed document with the determined differences from the disseminated document; and responsive to determining that the distance value is less than a threshold value, determining a user identifier for a user associated with the most similar changed document. - View Dependent Claims (12, 13, 14, 15)
-
Specification