[HOME] [DOWNLOAD] [DOCS] [NEWS] [SUPPORT] [TIPS] [ISSUES] [CONTRIBUTE] [DONATE]

Some thoughts on anti-stylometry research

Greetings

I am a researcher interested in developing practical tools for improvement of operational security (instead of proposals that get lost in academic papers or theory and never see the light of day)

One path I have interest in is the art of stylometry and its counter-measures. I have developed a prototype of a tool that can automatically iterate paragraphs of text through a online translation interface chained through multiple languages in attempt to destroy any linguistic clues because the fact sentiment analysis of machine translation is rather ‘lossy’.

That is not a best practice because it involves disclosing plain-text to an untrusted party and this is the drive for a client-side solution.

I have seen the 4ANT project (A4NT: Author Attribute Anonymity by Adversarial Training of Neural Machine Translation | USENIX) and some of its code (GitHub - rakshithShetty/A4NT-author-masking: Repository for author masking), but a issue with that is there is no documentation on actually using it. There is an old saying about academic research projects (paraphrasing) ‘You have the code and a paper go figure it out’ and it appears to be the situation here.

Maybe someone with enough time can correlate certain actions described in the paper with the code and get it working that way.

In any case. I would be of interest hearing about your experience.

Regards

2 Likes

From what I’ve seen in research findings, this technique is not adequate and also has the downside of producing unintelligible text. A better way is to have a user compare their new text to samples of older text they shared before and have the tool clear it as being different.

If you have time to help us integrate an existing tool that does what I mentioned let us know.

1 Like

Yes. I also agree attempted mutation by preexisting machine-translation methods is not a good solution.

A better way is to have a user compare their new text to samples of older text they shared before and have the tool clear it as being different.

The A4NT whitepaper with its classification appears to demonstrate the semantics of such differentiation and ‘clearance’.

If you have time to help us integrate an existing tool that does what I mentioned let us know.

The A4NT whitepaper seems to be the most complete solution in terms of analyzing trade-offs. The code is there but with minimal documentation. Unfortunately I also have minimal experience involving PyTorch so either I am going to have to learn or hopefully others with experience can contribute as well.

Thank you

1 Like
[Imprint] [Privacy Policy] [Cookie Policy] [Terms of Use] [E-Sign Consent] [DMCA] [Contributors] [Investors] [Priority Support] [Professional Support]