Obfuscating document stylometry
First Claim
Patent Images
1. A method, implementable at least in part by a computing device, the method comprising:
- measuring a value of one or more indicators of stylometry in a stylometric reference;
measuring a value of one or more indicators of distinctive stylometry in a document that correspond to the indicators of stylometry in the stylometric reference;
comparing, using one or more processors of the computing device, the indicators of distinctive stylometry in the document with the corresponding indicators of stylometry in the stylometric reference, in which comparing comprises, for each of the indicators of stylometry in the stylometric reference and the corresponding indicators of distinctive stylometry in the document, ranking the indicators of distinctive stylometry in an order from highest to lowest difference in a value of distinctive stylometry between the stylometric reference and the document;
providing, using one or more of the processors of the computing device, one or more alterations to the document to replace one or more of the indicators of distinctive stylometry in the document with one or more of the corresponding indicators of stylometry in the stylometric reference, following the ranking, beginning with the indicators of distinctive stylometry with the highest values of distinctive stylometry between the stylometric reference and the document, thereby reducing the distinctiveness of the indicators of distinctive stylometry in the document compared to the stylometric reference and increasing the anonymity of the document; and
measuring a total remaining stylometric distinctiveness between the document and the stylometric reference after each alteration of one of the indicators of distinctive stylometry in the document with one of the indicators of stylometry in the stylometric reference.
2 Assignments
0 Petitions
Accused Products
Abstract
A new system has been invented that can obfuscate the stylometry of a document. This may be used to anonymize a document and make it resistant to forensic stylometry analysis, or to mimic the style of an existing set of documents, for example. A system may compare indicators of distinctive stylometry in a document with corresponding indicators of distinctive stylometry in a stylometric reference, and provide one or more alterations to the document that alter the indicators of distinctive stylometry compared to the stylometric reference, according to one illustrative embodiment.
-
Citations
16 Claims
-
1. A method, implementable at least in part by a computing device, the method comprising:
-
measuring a value of one or more indicators of stylometry in a stylometric reference; measuring a value of one or more indicators of distinctive stylometry in a document that correspond to the indicators of stylometry in the stylometric reference; comparing, using one or more processors of the computing device, the indicators of distinctive stylometry in the document with the corresponding indicators of stylometry in the stylometric reference, in which comparing comprises, for each of the indicators of stylometry in the stylometric reference and the corresponding indicators of distinctive stylometry in the document, ranking the indicators of distinctive stylometry in an order from highest to lowest difference in a value of distinctive stylometry between the stylometric reference and the document; providing, using one or more of the processors of the computing device, one or more alterations to the document to replace one or more of the indicators of distinctive stylometry in the document with one or more of the corresponding indicators of stylometry in the stylometric reference, following the ranking, beginning with the indicators of distinctive stylometry with the highest values of distinctive stylometry between the stylometric reference and the document, thereby reducing the distinctiveness of the indicators of distinctive stylometry in the document compared to the stylometric reference and increasing the anonymity of the document; and measuring a total remaining stylometric distinctiveness between the document and the stylometric reference after each alteration of one of the indicators of distinctive stylometry in the document with one of the indicators of stylometry in the stylometric reference. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computing system comprising one or more processors and one or more data storage components, the computing system being configured for:
-
measuring a value of one or more stylometrically distinctive linguistic features in a target corpus; measuring a value of one or more corresponding stylometrically distinctive linguistic features in an input document; comparing, using one or more of the processors, the value of the stylometrically distinctive linguistic features in the target corpus with the value of the corresponding stylometrically distinctive linguistic features in the input document in which comparing comprises, for each of the stylometrically distinctive linguistic features in the target corpus and the corresponding stylometrically distinctive linguistic features in the input document, ranking the stylometrically distinctive linguistic features in order of greater to lesser difference in the value between the target corpus and the input document; replacing, using one or more of the processors, one or more of the stylometrically distinctive linguistic features in the input document with one or more of the stylometrically distinctive linguistic features in the target corpus, following the ranked order, beginning with the stylometrically distinctive linguistic features ranked with greater difference in the value, thereby reducing the stylometrically distinctive linguistic features in the input document and increasing the anonymity of the input document; and gauging a total remaining stylometric distinctiveness between the input document and the target corpus after each replacing of one of the stylometrically distinctive linguistic features in the input document with one of the stylometrically distinctive linguistic features in the target corpus.
-
-
16. A computer readable storage medium storing executable instructions that, when executed by the computer, cause the computer to perform a method comprising:
-
evaluating, using a processor of the computer, linguistic features indicative of distinctive stylometry in a document compared with corresponding linguistic features indicative of stylometry in a reference corpus, in which the reference corpus comprises an anonymized corpus of stylometric references; determining one or more of the linguistic features in the document that are stylometrically distinctive relative to the reference corpus, and a value of stylometric distinctiveness of the features relative to the reference corpus; ranking the stylometrically distinctive features in an order of their values of stylometric distinctiveness; modifying one or more of the linguistic features in the document, in an order of the ranking beginning with the features having the highest value of stylometric distinctiveness, to make the linguistic features in the document less stylometrically distinctive relative to the reference corpus and increase the anonymity of the document, in which modifying comprises altering one or more linguistic features in the document to mimick stylometric indicators in the stylometric references of the anonymized corpus; and after modifying, determining an updated gauge of total stylometric distinctiveness of the document and modifying one or more additional linguistic features in the document if the gauge of total stylometric distinctiveness of the document is above a pre-selected threshold.
-
Specification