- "No X, No Y, No Z." pattern
- "Here is X - it makes Y"
The worst and most obvious one is the constant over use of emoji ticks and crosses.
*actually a hyphen but it's functioning as an em dash.
This reminds me of another em dash+AI related topic, I've noticed LLMs have an extreme bias towards spaces around the dash while people can go either way with it.
And you can pry my em dashes from my cold, dead hands.
For better or worse (and pretty much for worse), these usages have become AI idioms. Language evolves over time, things that used to be harmless become offensive, certain terms end up taking on the complete opposite meaning than their original meaning, and we are watching certain language patterns and idioms become watermarks for AI and while it sucks, it doesn't make it false.
It's interesting why LLMs generate constructions like this more frequently than they presumably exist in the training set. I wonder if this is some sort of mode collapse caused by post training, and/or maybe because they are training on synthetic data so these things become self-perpetuating and self-amplifying (a feedback loop)?
The lesson for humans worried about being falsely identified as AI is just learn to write better! It doesn't matter where your repertoire of phrasing comes from (copying AI or not), but one of the basic rules of writing is not to repeat yourself unless you are doing so deliberately for a purpose. Go ahead and use "It's not just X. It's Y" if you want to, but if you use it multiple times in the same short piece of writing, then you may deserve to be called out for poor style, if not for being an AI.
If LLMs generated text based on training data frequency they'd likely be some of the most vulgar and hostile things ever created. The internet is full of insults, profanity, and low effort content. The repeated phrases are a side effect of reward optimization rather than some kind of model collapse.
Sometimes it’s not just about the Ys but also the Qs.
It is bad writing.
This is honestly both terrifying and well articulated.
High praise to the blog author.
How is this different from humans? When I went to high school, my teachers extorted me too. Especially subjects like English and unlike Math, where evaluation is 100% subjective.
I put to you, if you see a trope in AI writing it's because that trope appeared in the training corpus. Therefore, sure, being predjudice against it lets you catch some AI, but you'll also flag human outout. I think that may not be worth it in the end.
The "AI Detection" tools employed by schools also regularly flag writing from those with Autism, ADHD, and non-native English speakers as being AI generated as well.
So, naturally, I can't stand the phrase "write like AI" when these things tend to come up because no, there are no humans that "write like AI" it's the models that have stolen the literary devices from us and now have poisoned them.
That's really unfortunate though. It's like Michael Bolton from Office Space: "No way! Why should I change? He's the one who sucks."
Retr0id•55m ago
This feels like an easy enough hypothesis to verify, for anyone in the business of training LLMs - does the not-X-but-Y rate increase after RLVR?
andy99•40m ago