I am a researcher interested in developing practical tools for improvement of operational security (instead of proposals that get lost in academic papers or theory and never see the light of day)
One path I have interest in is the art of stylometry and its counter-measures. I have developed a prototype of a tool that can automatically iterate paragraphs of text through a online translation interface chained through multiple languages in attempt to destroy any linguistic clues because the fact sentiment analysis of machine translation is rather ‘lossy’.
That is not a best practice because it involves disclosing plain-text to an untrusted party and this is the drive for a client-side solution.
From what I’ve seen in research findings, this technique is not adequate and also has the downside of producing unintelligible text. A better way is to have a user compare their new text to samples of older text they shared before and have the tool clear it as being different.
If you have time to help us integrate an existing tool that does what I mentioned let us know.
Yes. I also agree attempted mutation by preexisting machine-translation methods is not a good solution.
A better way is to have a user compare their new text to samples of older text they shared before and have the tool clear it as being different.
The A4NT whitepaper with its classification appears to demonstrate the semantics of such differentiation and ‘clearance’.
If you have time to help us integrate an existing tool that does what I mentioned let us know.
The A4NT whitepaper seems to be the most complete solution in terms of analyzing trade-offs. The code is there but with minimal documentation. Unfortunately I also have minimal experience involving PyTorch so either I am going to have to learn or hopefully others with experience can contribute as well.
Yes. I have not researched this in some time but I was never able to utilise the pre-trained A4NT models to reproduce the results in their paper due to issues with the implementation. I was considering writing an issue report and contacting the original authors for comment however that idea was on pause for some time. Since there has been rapid advancement in text-processing with AI language models investigating the throughput and results of a newer CPU bound language model would be a better task.
New fad in the AI ecosystem is to describe models as open-source when the only attribute that is open-source is the runtime and perhaps a pre-trained model blob that can be utilized locally. Do not like it either.