Automatic generation of embedded signatures for duplicate detection on a public network
First Claim
1. A method comprising:
- identifying at least one set of words in a first electronic document, said set of words having a frequency of occurrence in a first collection of electronic documents that is below a predetermined threshold; and
transmitting a query to search a second collection of electronic documents for any electronic documents that contain the said set of words.
2 Assignments
0 Petitions
Accused Products
Abstract
In accordance with an aspect of the invention, a method and system are disclosed for constructing an embedded signature in order to facilitate post-facto detection of leakage of sensitive data. The leakage detection mechanism involves: 1) identifying at least one set of words in an electronic document containing sensitive data, the set of words having a low frequency of occurrence in a first collection of electronic documents; and, 2) transmitting a query to search a second collection of electronic documents for any electronic document that contains the set of words having a low frequency of occurrence. This leakage detection mechanism has at least the following advantages: a) it is tamper-resistant; b) it avoids the need to add a watermark to the sensitive data, c) it can be used to locate the sensitive data even if the leakage occurred before the embedded signature was ever identified; and, d) it can be used to detect an embedded signature regardless of whether the data is being presented statically or dynamically.
33 Citations
19 Claims
-
1. A method comprising:
-
identifying at least one set of words in a first electronic document, said set of words having a frequency of occurrence in a first collection of electronic documents that is below a predetermined threshold; and transmitting a query to search a second collection of electronic documents for any electronic documents that contain the said set of words. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method comprising:
transmitting a query to search a second collection of electronic documents for any electronic documents that contain a set of words having a frequency of occurrence in a first collection of electronic documents that is below a predetermined threshold. - View Dependent Claims (15, 16, 17)
-
18. An apparatus comprising:
-
means for identifying at least one set of words in a first electronic document, said set of words having a frequency of occurrence in a first collection of electronic documents that is below a predetermined threshold; and means for transmitting a query to search a second collection of electronic documents for any electronic documents that contain the said set of words.
-
-
19. A computer readable medium encoded with computer executable instructions defining steps comprising:
-
identifying at least one set of words in a first electronic document, said set of words having a frequency of occurrence in a first collection of electronic documents that is below a predetermined threshold; and transmitting a query to search a second collection of electronic documents for any electronic documents that contain the said set of words.
-
Specification