What If Your Writing Style Is a Fingerprint?
LLMs can now identify anonymous internet users by their writing style alone, for about four dollars per person. Every pseudonymous post ever written is retroactively exposed.
You posted from a Reddit throwaway three years ago. Said something about your job, mentioned your dog. That post is still on a server, written in your voice — and someone can now figure out it was you for about four dollars.
In February 2026, researchers at ETH Zurich and Anthropic published a paper demonstrating that large language models can identify anonymous internet users from their posts alone. Not by finding leaked metadata or tracing IP addresses. By reading what they wrote and reasoning about who writes that way. Demographic signals, professional vocabulary, interests mentioned in passing, sentence rhythm, regional spelling. The system identified 68% of targets at 90% precision. The cost: between one and four dollars per person.
The Archive Problem
The internet before 2024 is an open vault.
Not because anyone planned it that way. Because everyone who posted, commented, blogged, confessed, argued, reviewed, vented, and overshared did it under a simple assumption: pseudonymity equals practical safety. The cost of connecting a throwaway account to a real person was high enough that, for most people, nobody would bother. A private investigator might spend hours on a single target. A doxxer with a grudge might get lucky with a specific tell. But wholesale identification of anonymous users was out of reach for anyone without intelligence-agency resources.
That assumption was correct, right up until it stopped being correct.
The Lermen et al. paper didn't invent a new capability so much as demonstrate that an existing one had crossed a threshold. LLMs can extract identity signals from unstructured text, search the open web for candidate matches, and reason about whether the signals converge on a single person. The pipeline is API calls, not tradecraft. And it works retroactively. Every anonymous post ever made is sitting in a database somewhere, written in a voice that belongs to exactly one person, waiting for the cost of identification to drop below the threshold of someone's curiosity.
The room was never dark. The lights were just expensive to turn on.
The Paradox Coming Into View
Here is where it gets strange.
As more people use AI to write their emails, their social media posts, their work documents, and their comments, something happens to stylometric analysis: it breaks. Not because the technology fails, but because the fingerprints converge. When a thousand people run their thoughts through the same model before posting, the output carries the model's voice, not theirs. The sentence rhythms belong to Claude or GPT, not to a nurse in Phoenix or an engineer in Berlin. Stylometry needs variation to work. Homogeneity kills it.
AI writing might be accidentally building a kind of anonymity shield. Not by design. As a side effect of everyone outsourcing their prose to the same handful of language models.
But flip that around.
If AI-written text becomes the baseline, then people who don't use AI become outliers. Their authentic voice, their actual sentence structure, their specific vocabulary and the way they use commas and reach for particular metaphors, all of it becomes more distinctive, not less. Stylometric identification works best when targets stand out from the population. When most text sounds like an LLM, writing like yourself is the equivalent of wearing a neon sign.
The people most at risk are the ones who most value their own voice.
Everything below this line is speculation.
The Four-Dollar Question
Imagine a service. Upload a Reddit username, pay four dollars, get a confidence-scored match to a real identity. The technology exists, the research is published, and the API costs are documented in a preprint. The only thing preventing this from existing as a product is the decision to build it.
And if that decision hasn't already been made somewhere, it will be. The economics are too clean. A doxxing service with a 90% accuracy guarantee at the cost of a latte. An HR department running Glassdoor reviews through the pipeline before a retaliation meeting. An insurance company cross-referencing anonymous addiction recovery posts with applicant databases. An abusive ex paying four dollars to find the support forum their partner fled to.
The paper's authors know this. Simon Lermen, the lead researcher, wrote on his Substack that "the combination is often a unique fingerprint. Ask yourself: could a team of smart investigators figure out who you are from your posts? If yes, LLM agents can likely do the same, and the cost of doing so is only going down."
He's being precise. The capability isn't new. The cost is.
What Defense Looks Like (and Why It Probably Doesn't Work)
Research from Brennan and Greenstadt (2012) showed that deliberate style obfuscation could defeat stylometric analysis. That was a decade before LLMs. The new models may be harder to fool with simple obfuscation, and maintaining a fake writing style across dozens of posts is cognitively exhausting. One lapse, one late-night comment where you forget to stay in character, and the mask slips.
Some Hacker News commenters suggested running posts through an LLM anonymizer before publishing. But if everyone uses the same anonymizer, the anonymizer's output has its own detectable style. A defender strips personal voice with Model A. An attacker trains Model B to detect and remove Model A's anonymization layer, recovering traces of the original. The arms race never hits bedrock.
The Whonix project, which maintains guides for operational security, tested running text through multiple translation services to scramble style. English to French to Japanese back to English. It doesn't work. Semantic patterns survive translation.
Compartmentalization is the only approach that holds up: completely separate identities with separate devices, separate interests, separate writing patterns, and zero cross-referencing of topics between accounts. You're not maintaining one identity and one pseudonym. You're maintaining two full-time characters who share nothing. Most people can't sustain this for a week.
The Threat Model We Built Our Lives On
Practical obscurity was always a polite fiction. The information was there, the posts were public, and the writing patterns were unique. What protected anonymous users wasn't cryptography or law or technology. It was economics. Identification was too expensive to be worth the effort for most targets.
The paper didn't change what's possible. It changed what's affordable.
And affordability rewrites the threat model, because it changes who can do it. When deanonymization costs tens of thousands of dollars, the threat actors are nation-states and corporations. When it costs four dollars, the threat actors are everyone. The jilted ex with a credit card. The middle manager who suspects a subordinate wrote that Glassdoor review. The landlord who wants to know what their tenant said in a housing complaint forum. The parent who wants to find their estranged adult child's anonymous social media.
Millions of people disclosed sensitive information under the protection of a threat model that no longer functions. They can't un-write those posts. The words are there, in their voice, on servers they don't control.
What You Can Do About It
There's no browser plugin that fixes the retroactive problem. But "the lights are on and nothing can be done" isn't true either. Some of this is damage control, some is forward planning, and none of it is perfect.
Burn the archive. Most platforms allow bulk deletion. Reddit, old forums, abandoned blogs. Tools like Redact and Shreddit can wipe years of post history in minutes. You can't control what's been scraped and cached, but you can remove the primary source. Every deleted post is one fewer data point in a stylometric profile.
Use AI writing as a privacy layer. The paradox from earlier cuts both ways. If AI-generated text defeats stylometry by homogenizing voice, then running sensitive posts through an LLM before publishing is a deliberate anonymity tool, not just a convenience. You lose your voice. You gain cover. Whether that tradeoff is worth it depends on what you're posting and who might be looking.
Compartmentalize going forward. The defense section above was pessimistic about maintaining separate identities, and it is genuinely hard. But "hard" isn't "impossible." Separate accounts for separate topics, separate devices where it matters. The goal isn't perfection — it's raising the cost of correlation above four dollars.
Push for the law to catch up. No court has ruled on whether stylometric profiling constitutes a search under the Fourth Amendment, though the framework from Carpenter v. United States (2018) could extend there. The EU AI Act's biometric data provisions already restrict some forms of remote identification. Writing style hasn't been tested against those provisions yet, but the argument that a behavioral pattern unique enough to identify someone is biometric data isn't a stretch. That classification determines whether someone needs your consent or a warrant before running the analysis.
The lights came on. You can't make the room dark again. But you can stop standing in the middle of it.