Open Source AI and Anti-Stylometry

victor · March 20, 2026, 1:24pm

with a fast cpu, 7b models can be run in vm. the vm could have gpu acceleration enabled to make it faster. the model could be changed to a faster smaller model if open-weight models are fine to use. docker might be an option instead of vm.

i think local ai makes privacy easier. you don’t have to search the internet as much. just ask it to translate text, help draft messages, or summarize anything you give it. keeps everything on your machine.

open-source ai

model

OLMo-3 7B is a model that complies with the OSAID 1.0

inference

llama.cpp is available in sid

github.com/ggml-org/llama.cpp

docs/build.md

master

# Build llama.cpp locally

The main product of this project is the `llama` library. Its C-style interface can be found in [include/llama.h](../include/llama.h).

The project also includes many example programs and tools using the `llama` library. The examples range from simple, minimal code snippets to sophisticated sub-projects such as an OpenAI-compatible HTTP server.

**To get the Code:**

```bash
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
```

The following sections describe how to build with different backends and options.

* [CPU Build](#cpu-build)
* [BLAS Build](#blas-build)
* [Metal Build](#metal-build)
* [SYCL](#sycl)
* [CUDA](#cuda)

This file has been truncated. show original

gguf

various gguf exist

anti-stylometry

sadly i don’t have any proven working prompts right now, just ideas:

translate to another language then back to english
make it act like a persona
give the model examples to follow
ask it to rewrite the text with different words and structure

Patrick · March 20, 2026, 4:17pm

OSAID unfortunately isn’t Open Source in its original meaning as per OSD.

Open Source AI Definition - Lack of Consensus in the Definition by the Open Source Initiative

In my testing, the quality and performance were not sufficient for practical use. Elaborated here:
AI Based Stylometry Defense

Installing OSAID AI by default would prevent calling the project Open Source / Free Software. It would need to be downgraded to OSAID. That would be a pretty big change.

But an optional wiki chapter explaining how to set this up would be good to have.

Whonix Policy On Non-Freedom Software (for context)
Avoid Non-Freedom Software (for context)

victor · March 21, 2026, 5:35am

OLMo-3 seems to comply with OSD

Patrick:

I have tested local freeware [1] models such as DeepSeek R1, Facebook’s LLaMA, and various distilled models on fast consumer hardware with the latest gaming GPU. In my testing, the quality and performance were not sufficient for practical use. The better models, for which the quality might be usable, were too slow. A reply to “hi” took 30 minutes. Distilled (“simplified, faster”) models did not produce acceptable output quality for my use cases, including typo / grammar review and code review.

It might be feasible with an NVIDIA H100 for an approximately $38K USD purchase, or at $2.40/hr when rented in a data center. Renting in a data center raises privacy and security concerns. The data center could log / tamper with all inputs and outputs. I have not tried that yet.

what were the model sizes? was this actually gpu-accelerated? for consumer stuff, you want something under 8b params and like 2-4gb in size; i even used one to rewrite that sentence right now. tps could get slow on old hardware or when it’s cpu-bound, so a non-reasoning model is probably better.

Patrick · March 21, 2026, 10:46am

Misuse of the Term Open Source in the Context of Some AI Projects may be applicable?

Most AI projects are referring to themselves with the label “Open Source” even if it’s not true. The term “Open Source” has been watered down, redefined. The Open Source Initiative unfortunately has acted as an accomplice with their OSAID.
(Open Source AI Definition - Lack of Consensus in the Definition by the Open Source Initiative)

So at this time, I fall back to my default assumption that AI is freeware, “fake Open Source” (using the label without its prior decades old definition) or OSAID.

At this point, an AI projects need to mention “Open Source as per OSAID”, show awareness of these issues in other ways, better yet use “Free Software” (as defined by FSF / GNU) or “Freedom Software”.

I am not saying that is the case of the specific AI mentioned but since in the blog post I haven’t seen awareness of these issues, I fall back to my default assumption.

victor · March 21, 2026, 12:04pm

see the difference between “open-weight” gemma and “open-source” OLMo

arraybolt3 · March 22, 2026, 2:48am

The license of OLMo 3 appears to be Apache 2, which is pretty widely recognized as a truly FOSS license. The training dataset is open-access, but is scraped, so its contents can’t be construed as free. The database containing the data used to train the model is under a fairly open-source-looking license (ODC-By 1.0, I haven’t read the whole thing but it looks promising at first glance, it does seem to be considered not-allowed license by Fedora but that may just be because of its explicit lack of a patent license).

arraybolt3 · March 22, 2026, 5:14am

Alright, I tried it (in free time, not during my normal work under contract). It seems to border on usable even in a virtual machine if you have a really fast CPU. My original message:

I went ahead and tried OLMo 3 to see if it could work as a stylometry cloak mechanism. Giving a Kicksecure 18 VM all 32 cores of an i9-14900HX system, the model runs pretty slowly under Ollama, seconds per token rather than tokens per second. This is similar to the results Patrick found, though I’m using CPU-only rendering and assume my NVIDIA GPU would make things faster. Is it practically usable? Kinda, I guess. If you’re patient, have fairly fast hardware, and are hyper-paranoid, it might be worth it. Of course, this raises the question of how much of one’s style can leak through AI “sanitization”.

(Note that I actually got better than seconds-per-token performance when using all 32 cores. 16 cores was awful, 32 cores was somewhat livable, especially after making sure my CPU was in a very high performance mode. This prompt is what I used to find out if 32 cores was faster than 16 cores or not though, so I wrote this as if I had already tested it. Apologies for the inaccuracy. Also, I’m referencing Patrick’s results here. Lastly, yes, my wording when talking about CPU vs GPU based operation is horrible. The AI made it more horrible. :P)

The prompt I used to “sanitize” the text was:

Please rewrite the following text, keeping all meaningful comment but obscuring the writing style: …

(Yes, I wrote “comment” rather than “content”. Whoops.)

After about… oh, maybe 10-15 minutes of processing and a bunch of “thinking” output, this is what it gave me:

I assessed OLMo 3’s suitability as a stylometry obfuscation tool by assigning all 32 cores of an i9-14900HX processor to a Kicksecure 18-vm environment. The model demonstrated suboptimal performance, producing outputs at a tokens-per-second rate inversely proportional to desired efficiency (i.e. seconds per token), consistent with Patrick’s findings. His results may reflect GPU-accelerated implementations, which my CPU-only setup did not employ. While not entirely impractical, its utility depends on factors like patience, robust hardware availability, and a security-first mindset. This experiment highlights unresolved questions about the efficacy of AI-driven content filtering in obscuring authorial stylistic patterns.

That’s reasonably close to what I said, so I’d say that’s a minor success.

(Edit: For clarification, I’m using the 7B parameter variant of OLMo 3 here.)

victor · March 22, 2026, 2:51pm

i use quantization q4_k_m, olmo-3-7b-instruct (non-thinking). i can get 2tps easily on cpu, which is pretty fast. for kvm, you can enable 3d acceleration, and with llama.cpp use vulkan for gpu performance.
are you using kvm or qubes? if you’re using kvm, it’s a good idea to give each vcpu its own cpu core.

prompt:

Please rewrite the following text, keeping all meaningful content but obscuring the writing style:
“”"
…
“”"

output:

I decided to test OLMo 3 by using it as a possible method for hiding stylistic fingerprints—essentially attempting to use it as a cloak for stylometry. Running a fully equipped Kicksecure 18 virtual machine on all 32 physical cores of an i9-14900HX system, I observed that Ollama struggled to process the model; each token took several seconds, rather than delivering tokens quickly. This performance aligns somewhat with what Patrick experienced, though I conducted the test on CPU alone. It’s likely that running the same setup on a GPU would speed things up significantly.
So, is this approach actually workable in real-world scenarios? It has its merits, but only in very specific situations. Someone willing to wait, who possesses powerful hardware, and who operates with a high level of paranoia might find it justifiable to employ such a solution. Still, this raises the important point about how much of a person’s writing style can still escape even when processed through AI-based filtration systems.

arraybolt3 · March 22, 2026, 3:27pm

I used the thinking variant of OLMo. A non-thinking variant would be faster indeed. As far as content goes I think your experiment gave better results than mine, but it feels like more of the original style is bleeding through (similar sentence structure and order is preserved for instance). The thinking variant seems to have done a better job at masking style.

Is there any research into the use of AI for style cloaking purposes? In particular, I can think of two kinds of attacks that could be used against anti-stylometry like this:

Compromise of identity; knowing user A’s writing style, is it possible to tell if AI-sanitized text was originally written by user A or not?
Degredation of anonymity to pseudonymity: if users A and B use the same AI with the same prompts to sanitize their text, does their style influence the AI’s output such that messages posted by A appear different in style from messages posted by B, even if those messages cannot be traced back to A and B in particular?

Lots of other things would need to be investigated, like what the best models and model variants are, what prompts to use for them, how to safely fine-tune output so that if the AI says something you didn’t mean to say, you can fix it without leaving a fingerprint, etc.

It looks like there’s a study like this already, although it only studies one closed-source model and one source-available model:

Patrick · March 23, 2026, 10:13am

Quote from comment section:

LWN.net comment:

This may sound like a trivial “is a hotdog a sandwich?” type of question, but it’s really not. Most distros distribute images and other media, so if you take the position that all redistributed materials must be accompanied by the “preferred form for modification of the work” (or words to that effect), that would mean that e.g. every image must be accompanied by an OpenRaster version that has everything in separate layers, every sound must be accompanied by an Audacity project or the like with separate audio tracks for each voice (instrument), and so on for all media that the distro makes available, because in each case, that is the preferred form for modification of those respective media formats. Is the average disto really going to do all of that? I suspect not.

victor · March 24, 2026, 5:48am

sampler settings, the model, and even the prompts themselves can all be tweaked to make the output better. asking the llm to obscure someone’s writing style is probably too vague for it to really change much.

from my point of view, debian tries to replicate all packages as it can. but replicating an ai model just isn’t cheap enough for them to do, so they’re kind of stuck deciding what to do. the next best thing probably is to ask the model makers to share their datasets, code, and all the different parts of how the model works so everyone can look at it and even tweak stuff. but usually the high-quality models don’t share their data, probably because they were trained on stuff that’s copyrighted and they never got proper permission for it. so the data stays hidden. i think the people behind OSAID saw this and decided you don’t need to share the datasets just the license and the code (whether that’s right or not is another story).

to dodge getting caught up in this debate, just don’t share the model files at all. we can share the download and the hashes.

llama.cpp with the cpu or vulkan backend is probably the easiest. debian package: Debian -- Details of package llama.cpp in sid. the vulkan backend is also available via libggml0, see here: Debian -- Details of package libggml0-backend-vulkan in sid

running on qemu/kvm with virtio-gpu gives me like 80% of the host speed for llama.cpp.

i’m looking at a smaller 3b model (smollm) that should be a bit faster than the 7b version: SmolLM by HuggingFace | European Open Source AI Index

victor · March 24, 2026, 1:17pm

sadly, the sampler settings aren’t shared. and from what i see, they didn’t share different prompt versions.
prompt examples

Masks and Mimicry: Strategic Obfuscation and Impersonation Attacks on
Authorship Verification https://arxiv.org/pdf/2503.19099v1
Mistral-7B v0.1

You are an expert hacker and linguist. The purpose of paraphrasing writing style of a text while maintaining its semantics is to make language models more robust to adversarial attacks. Your task is to generate a new sentence which must satisfy the following conditions:

The new sentence should paraphrase 30% of the original sentence

Keeping the semantic meaning of the new sentence unchanged;

You can finish the task by modifying the sentence using the following guidance:

Paraphrase at most 30% of the original sentence

Only output the new sentence without anything else.

Personalized Author Obfuscation with Large Language Models https://aclanthology.org/2025.ranlp-1.133.pdf
LLaMA-3.1 and GPT-4

Paraphrase the following text to obfuscate the author’s identity while
maintaining the meaning. Ensure the paraphrased version has more double quotation marks than the input.
Only return the paraphrased text.
Input text: {}
Output:

Unraveling Interwoven Roles of Large Language Models in Authorship
Privacy: Obfuscation, Mimicking, and Verification https://arxiv.org/pdf/2505.14195
GPT-4o-mini, GPT-o3-mini, Gemini-2.0, and Deepseek-v3

System Prompt: You are an emulator designed to hide the writing style of a human author.
• Instruction: You are given 5 sample writings from an author. The goal of this task is to conceal the author’s writing style by carefully modifying lexical richness and diversity, sentence structure, punctuation patterns, special character usage, expressions and idioms, overall tone, emotion, mood, and any other distinguishing stylistic elements. Your task is to generate {avg}-word continuation that has writing style significantly different from the provided input text. Strive to make the rewritten text distinguishable from both the input text and the 5 sample writings by the author. As output, exclusively return the text completion without any accompanying explanations or comments.
• Context: Here is some information about the author: {author identification}. The 5 sample writings from an author: {sample text}.
• Task: The input text is: {input text}.

arraybolt3 · March 24, 2026, 11:17pm

Last time I tried to get this working on a device with both NVIDIA and Intel GPUs, it was an exercise in futility due to I think AppArmor restrictions, among other things. I was able to kind of get it to work when using raw QEMU, but didn’t see any graphics performance advantage. If you have advice for how to get this sort of thing working reliably, that would be quite useful.

victor · March 25, 2026, 8:36am

gist.github.com

https://gist.github.com/peppergrayxyz/fdc9042760273d137dddd3e97034385f

qemu-vulkan-virtio.md

# QEMU with VirtIO GPU Vulkan Support

With its latest reales qemu added the Venus patches so that virtio-gpu now support venus encapsulation for vulkan. This is one more piece to the puzzle towards full Vulkan support. 

An outdated blog post on [clollabora](https://www.collabora.com/news-and-blog/blog/2021/11/26/venus-on-qemu-enabling-new-virtual-vulkan-driver/) described in 2021 how to enable 3D acceleration of Vulkan applications in QEMU through the Venus experimental Vulkan driver for VirtIO-GPU with a local development environment. Following up on the outdated write up, this is how its done today.

## Definitions

Let's start with the brief description of the projects mentioned in the post & extend them:

This file has been truncated. show original

qemu-system-x86_64                                               \
    -enable-kvm                                                  \
    -M q35                                                       \
    -smp 4                                                       \
    -m 16G                                                       \
    -cpu host                                                    \
    -net nic,model=virtio                                        \
    -net user                                                    \
    -device virtio-vga-gl,hostmem=8G,blob=true,venus=true        \
    -vga none                                                    \
    -display gtk,gl=on,show-cursor=on                            \
    -usb -device usb-tablet                                      \
    -hda $IMG                                                    \

sudo apt install llama.cpp libggml0-backend-vulkan

llama-cli -m SmolLM3-3B-Q5_K_M.gguf -c 8192 --jinja

gpu acc for llama.cpp needs vulkan, which relies on venus. i got it working with that command above. if the app isn’t using vulkan, using virgl probably won’t make you notice anything. virgl supports opengl and it’s doable if you check out that guide: Whonix for KVM

Patrick · March 28, 2026, 8:27am