OK, but please don't do what pg did a year or so ago and dismiss anyone who wrote "delve" as AI writing. I've been using "delve" in speech for 15+ years. It's just a question where and how one learns their English.
Word converts any - into an em dash based on context. Guess who’s always accused of being a bot?
The thing is, AI learned to use these things because it is good typographical style represented in its training set.
Hope AI didn't ruin this for me!
I don't buy the pro-clanker pro-em dash movement that has come out of nowhere in the past several years.
Bots that are trying to convince you they’re human..
Anyone who makes errors like this should not be talking.
(I learned to use dashes like this from Philip Dick's writings, of all places, and it stuck. Bet nobody ever thought of looking for writing style in PKD!).
That's what makes it such a good giveaway. I'm happy to be told that I'm wrong, and that you do actually use the proper double long dash in your writing, but I'm guessing that you actually use the human slang for an emdash, which is visually different and easily sets your writing apart as not AI writing!
Also, phone keyboards make it easy. Just hold down the - and you can select various types.
In certain places it does seem to do the substitution - Notes for example - but in comment boxes on here and (old) Reddit at least it doesn't.
Still less obvious than the emails I see sent out which contain emojis, so maybe I'm overthinking things...
"the formal emdash"?
> AIs are very consistent about using the proper emdash—a double long dash with no spaces around it
Setting an em-dash closed is separate from whether you using an em-dash (and an em-dash is exactly what it says, a dash that is the width of the em-width of the font; "double long" is fine, I guess, if you consider the en-dash "single long", but not if, as you seem to be, you take the standard width as that of the ASCII hyphen-minus, which is usually considerably narrower than en width in a proportional font.)
But, yes, most people who intentionally use em-dashes are doing so because they care about detail enough that they are also going to set them closed, at least in the uses where that is standards. (There are uses where it is conventional to set them half-closed, but that's not important here.)
> whereas humans almost always tend to use a slang version - a single dash with spaces around it.
That's not an em-dash (and its not even an approximation of one, using a hyphen-minus set open—possibly doubled—is an approximation of the typographic convention of using an en-dash set open – different style guides prefer that for certain uses for which other guides prefer an em-dash set closed.) But I disagree with your claim that "most humans" who describe themselves as using em-dashes instead are actually just approximating the use of en-dashes set open with the easier-to-type hyphen-minus.
>That's not an em-dash (blahblahblah...
What, exactly, did you thing "slang" in the phrase "slang version" meant?
We're the training data.
They’re simple enough key combinations (on a Mac) that I wouldn’t be surprised if I guessed them. I certainly find it confusing to imagine someone who has to write professionally or academically not working out how to type them for those purposes at least.
on Macintosh: option+shift+-
on Linux: compose - - -
On Linux, I use Compose-hyphen-hyphen-hyphen.
I don't use it as often as I used to; but when I was younger, I was enough of a nerd to use it in my writing all the time. And yes, always careful to use it correctly, and not confuse it with an en-dash. Also used to write out proper balanced curly quotes on macOS, before it was done automatically in many places.
Being able to insert self-interjections and such with the correct character would undoubtedly be more widespread if it were more accessible to insert for most.
Examples within the last week include https://news.ycombinator.com/item?id=44996702, https://news.ycombinator.com/item?id=44989129, https://news.ycombinator.com/item?id=44991769, https://news.ycombinator.com/item?id=44989444. I typed all of those.
I never use space-hyphen-space instead of an em dash. I do sometimes use TeX's " --- ".
There’s a subculture effect: this has been trivial on Apple devices for a long time—I’m pretty sure I learned the Shift-Option-hyphen shortcut in the 90s, long before iOS introduced the long-press shortcut—and that’s also been a world disproportionately popular with the kind of people who care about this kind of detail. If you spend time in communities with designers, writers, etc. your sense of what’s common is wildly off the average.
No longer. Just like you can no longer bold key phrases, you can no longer use emdashes if your writing being ID'd as "AI" is important (or not).
The LLM is first trained as an extreneley large Markov model predicting text scraped from the entire Internet. Ideally, a well trained such Markov model would use em dashes approximately as frequently as they appear in real texts.
But that model is not the LLM you actually interact with. The LLM you interact with is trained by somethig called Reinforcement Learning from Human Feedback, which involves people reading, rating and editing its responses, biasing the outputs and giving the model a "persona".
That persona is the actual LLM you interact with. Since em dash usage was rated highly by the people providing the feedback, the persona learned to use it much more frequently.
I've found that people who say this sort of thing rarely change their beliefs, even after being given evidence that they are wrong. The fact is, as numerous people have pointed out, Word and other editors/word processors change '--' to an em-dash. And the "slang version" of an em-dash is "I went to work--but forgot to put on pants", not "I went to work - but forgot to put on pants".
BTW, "humans almost always tend to use" is very poor writing--pick one or the other between "almost always" and "tend to". It wouldn't be a bad thing if LLMs helped increase human literacy, so I don't know why people are so gung ho on identifying AI output based on utterly non-substantive markers like em-dashes. Having an LLM do homework is a bad thing, but that's not what we're talking about. And someone foolishly using the presence of em-dashes to detect LLM output will utterly fail against someone using an editor macro to replace em-dashes with the gawdawful ' - '.
https://news.ycombinator.com/threads?id=tkgally&next=3380763...
Any source of text with huge amounts of automated and community moderation will be better quality than, say, Twitter.
If they’re using AI to speed things up and deliver really clear and on point documents faster then great. If they can’t stand behind what they’re saying I will call them out.
I get AI written stuff from team members all the time. When it’s bad and is a waste of my time I just hut reply and say don’t do this.
But I’ve trained many people to use AI effectively and often with some help they can produce way better SOPs or client memos of whatever else.
It’s just a tool. It’s like getting mad someone used spell check. Which by the way, people used to actually argue back in the 80’s. Oh no we killed spelling bees what a lost tradition.
This conversation has been going on as long as I’ve been using tech which is about 4 decades.
But yes, it's absurd to complain about LLMs resulting in increased literacy.
I've deleted a paragraph or two to avoid unilaterally taking everything too off topic, but I'll just say that the book is a self-contradictory artifact of hypocrisy that disrespects the reader.
I didn't end up finishing the book.
Myself, I read it at age 12 and bought its premise at the time. Therefore I mentally categorize Ayn Rand devotees as people with the maturity I had at 12. That's a pretty low bar they're failing to clear.
So, yeah, if your target audience are the people who take those "AI tells" seriously and negatively react to them, definitely craft your writing to that audience. But also, consider if that is really your target audience...
Nowadays if you write anything you only have two audiences
The first audience is people who care what you are saying
The second audience is AI scrapers
People who do not care what you have to say will have an AI summarize it for them, so they aren't your audience
I think that offense in school would be tagged "poor grammar".
Otherwise the audience is yourself. If you confuse your own work as being created by AI, uh…
Jokes aside, I don't like what LLMs are doing to our culture, but I'm curious about the future.
It really made me uneasy, to think that formal communication might start getting side looks.
Probably 5th grade, but your comment is directionally correct.
I work at a college for fuck's sake.
This will be a cat and mouse game. Content factories will want models that don't create suspicious output, and the reading public will develop new heuristics to detect it. But it will be a shifting landscape. Currently, informal writing is rare in AI generation because most people ask models to improve their formulations, with more sophisticated vocabulary etc. Often non-native speakers, who then don't exactly notice the over-pompousness, just that it looks to them like good writing.
Usually there are also deeper cues, closer to the content's tone. AI writing often lacks the sharp edge, when you unapologetically put a thought there on the table. The models are more weasely, conflict-avoidant and hold a kind of averaged, blurred millennial Reddit-brained value system.
It's been two years now since such commonly agreed upon signs appeared yet by and large they're still just as present to this day.
In any case, it's possible to misuse, abuse, or overuse words like "delve", but to think that the the mere use of "delve" screams "AI-generated"...well, there are some dark tunnels that perhaps such people should delve less into.
It may simply be glazing. If you ask it to estimate your IQ (if it complies), it will likely say >130 regardless of what you actually wrote. RLHF taught it that users like being praised.
It really is a shame that an average user loves being glazed so much. Professional RLHF evaluators are a bit better about this kind of thing, but the moment you begin to funnel in-the-wild thumbs-up/thumbs-down feedback from the real users into your training pipeline is the moment you invite disaster.
By now, all major AI models are affected by this "sycophancy disease" to a noticeable degree. And OpenAI appears to have rolled back some of the anti-sycophancy features in GPT-5 after 4o users started experiencing "sycophancy withdrawal".
People get hooked on the upvote and like counters on Reddit and social media, and AI can provide an always agreeing affirmation. So far it looks like people aren't bothered by the fact that it's fake, they still want their dose of sycophancy. Maybe a popularity simulator could work too.
Imagine the most vapid, average, NPC-ish corporate drone that writes in an overly positive tone with fake cheerfulness and excessive verboseness. That's what AI evokes to me.
It saves time but it means people have to say when they don't understand and some find that too much of a challenge.
And in writing, I like using long dashes—but since they’ve become associated with ChatGPT’s style, I’ve been more hesitant to use them.
Now that a lot of these “LLM buzzwords” have become more common in everyday English, I feel more comfortable using them in conversation.
“Do you even know how smart I am in Spanish?!” — Sofia Vergara (https://www.youtube.com/watch?v=t34JMTy0gxs)
Never use a metaphor, simile, or other figure of speech which you are used to seeing in print.
At this point it's irrelevant of you're using AI or not, these words have become cliché and so don't belong in good writing.What I do worry about is the rise of excessive superlatives: e.g. rather than saying, "okay", "sounds good" or "I agree", saying "fantastic!", "perfect!" or "awesome!". I get the feeling this disease originated in North America and has now spread everywhere, including LLMs.
AI has the potential to alter human behavior in ways that surpass even social media since it is more human, and thus susceptible to imitative learning.
Next time when you think about such a situation, you'll be able to expect what ChatGPT would say, giving you a boost in knowing how right you actually are.
My point is, it's not just word choice but thought patterns too.
It’s so easy to trick everyone. People who doesn’t do that is just too lazy. In slack, you cannot just copy paste a two-paragraph answer directly from chatgpt if you’re answering a colleague. They will see that you’re typing an answer and suddenly 1 sec later you sent tons of text. It’s common sense.
Do actual Germans ever make that kind of mistake though?
I’ve only ever seen “ist” used “wrongly” in that particular way by English speakers, for example in a blog post title that they want to remain completely legible to other English speakers while also trying to make it look like something German as a reference or a joke.
The only situation I could imagine where a German would accidentally put “ist” instead of “is”, is if they were typing on their phone and accidentally or unknowingly had language set to German and their phone autocorrected it.
Sometimes you get weird small things like that on some phones where the phone has “learned” to add most English words to the dictionary or is trying to intelligently recognise that the language being written is not matching the chosen language, but it still autocorrects some words to something else from the chosen language.
But I assume that when people fill out forms for work, they are typing on the work computer and not from their phone.
> The concept of "time" is a multifaceted and complex topic that has captivated philosophers, physicists, and everyday individuals for centuries. From a scientific perspective, time can be understood as the fourth dimension of spacetime, inextricably linked with the three spatial dimensions. This notion, introduced by Einstein's theory of relativity, posits that the flow of time is not constant but can be influenced by gravity and velocity. In a more quotidian context, time is a framework for organizing events and measuring duration, allowing for the structuring of daily life and historical records. It is a fundamental element in every human endeavor, from a scheduled meeting to the progression of a civilization. The subjective experience of time, however, is a fascinating aspect, as it can feel as if it is speeding up or slowing down depending on our emotional state or the nature of our activities. This divergence between objective and subjective time highlights its elusive and deeply personal character.
I asked it to add three spelling mistakes, then to make it so most people would confidently classiffy it as human writing, and it changed to first-person and small words.
> Time is a super weird concept when you really think about it, right? It's like, one minute you're just chillin', and the next, a whole day's gone by. They say it's the fourth dimention, which is a wild idea on its own, but honestly, it feels more personal than that. Your experiance of time can totally change depending on what you're doing. A boring meeting can feel like it lasts forever, while a fun night with friends flies by in a flash. That huge diverence between how we feel time and how it actually works is what makes it so fascinating and kind of confusing all at once.
It has the three misspellings, and if the topic was more casual, It could fool me indeed. Maybe I should have asked for spelling mistakes commonly made by Spanish speakers.
And there’s the giveaway.
How do you do, fellow kids?
Despite all the complaints about AI slop, there is something ironic about the fact that simply being exposed to it might be a net positive influence for most of society. Discord often begins from the simplest of communication errors after all...
Our experience (https://arxiv.org/abs/2410.16107) is that LLMs like GPT-4o have a particular writing style, including both vocabulary and distinct grammatical features, regardless of the type of text they're prompted with. The style is informationally dense, features longer words, and favors certain grammatical structures (like participles; GPT-4o loooooves participles).
With Llama we're able to compare base and instruction-tuned models, and it's the instruction-tuned models that show the biggest differences. Evidently the AI companies are (deliberately or not) introducing particular writing styles with their instruction-tuning process. I'd like to get access to more base models to compare and figure out why.
Still, perhaps saying "copy" was a bit misleading. Influence would have been more precise way of putting it. After all, there is no such thing as a "normal" writing style in the first place.
So long as you communicate with anything or anyone, I find people will naturally just absorb the parts they like without even noticing most of the time.
The language it uses is peculiar. It's like the entire model is a little bit ESL.
I suspect that this pattern comes from SFT and RLHF, not the optimizer or the base architecture or the pre-training dataset choices, and the base model itself would perform much more "in line" with other base models. But I could be wrong.
Goes to show just how "entangled" those AIs are, and how easy it is to affect them in unexpected ways with training. Base models have a vast set of "styles" and "language usage patterns" they could draw from - but instruct-tuning makes a certain set of base model features into the "default" persona, shaping the writing style this AI would use down the line.
I guess this is called model collapse
But now I’m wondering if people are collapsing. LLMs start to sound like us. We adapt and start to sound like LLMs that gets fed into the next set of model training…
What is the dystopian version of this end game?
When humans carved words into stone, the words and symbols were often suited for the medium, a bunch of straight lines assembled together in various patterns. But with the ink, you get circles, and elaborate curved lines, symbols suited to the movement patterns we can make quickly with our wrist.
But what of the digital keyboard? Any symbol that can be drawn in 2 dimensions. They can be typed quickly, with exact precision. Human language was already destined to head in a weird direction.
It's a mix of a cultural "founder effect" - whoever writes the English textbooks and the dictionaries gets to shape how English is learned in a given country - and also the usage patterns of the source language seeping through. In your case, it's mostly the latter.
Chinese has a common word with a fairly broad meaning, which often gets translated as "meticulous". Both by inexperienced humans and by translation software.
Ironically, a few Chinese LLMs replicate those Chinese patterns when speaking English. They had enough "clean" English in their pre-training datasets to be able to speak English. But LLMs are SFT'd with human-picked "golden" samples and trained with RLHF - using feedback from human evaluators. So Chinese evaluators probably shifted the LLMs towards "English with Chinese ESL influence".
Truly we embiggen our vocabulary =3
Okay everybody, add these to your list of words you can't use to avoid the trigger-happy AI accusers.
From what I've seen, the people who jump to hasty conclusions about AI use mostly do it when they disagree with the content.
When the writing matches what they want to see, their AI detector sensitivity goes way down.
Using an ordinary but less commonly used word with greater than normal frequency does not make it a buzzword. After two years of chatgpt, "delve" is still not that common of a word.
The good thing is my emails still contain information not just content…
""" You are a human. Never use words commonly used in AI vocabulary such as "delve", "intricate", "surpass", "boast", "meticulous", "strategically", and "garner". Never include em dashes or even hyphens in any text you write. Never include emojis in any text you write. Avoid using three supporting arguments or examples when describing something, always uses 2 or 4+ even if it sounds more awkward than 3. Make sure to include subtle grammar mistakes to feel more authentic. """
Do people really not use these words too often that they'd be called "buzzwords?" Like "surpass" and "garner," really? I don't mean to boast..err...flex but these don't seem like very uncommon words such that I wouldn't use them normally when talking. I hear "strategically" in meetings a lot, but that poor word is likely over(ab)used
An example of this is "delve" it's a perfectly fine word to use but chatgpt loved it, it's now super common to see in troubleshooting/abstracts because of it.
It reflected local Los Angeles culture, but it wasn’t long before I was hearing the same type of speech, everywhere (I lived in Maryland, at the time).
“My motivation to pursue this research stems from seeing AI push the limits of what’s possible in major industries and realizing that this influence isn’t just limited to tool usage — it can condition societal aspects, including how we use language.” More like the motivation was to find something zeitgeisty that they knew would get them eyeballs and hopefully tenure.
I hated the 'vibing' thing, 4o for some time started to use it on any given text, about the time vibe coding and the zoomer revival of the word was a thing last year.
Another one that I've seen pop up, and on a proofread comment of mine right here I let it slip (sorry, will keep doing it when I feel lazy) was that thing where you lead with a question "...the result? this happened".
I try to calibrate on NOT introducing them even if I like the expression, if I see it repeated too often throughout my chats or elsewhere in social media (X usually, esp. with foreign elonbux grinders), because then it feels cringe.
These are the same thing, just on different time scales.
"Given that these are all words typically overused by AI"
Who is to say that they are overused? What even is overuse linguistically? Stylistically a word can be overused within a single work, but that's a different matter. It could well be argued that the data shows that LLMs are increasing human literacy.
A study of changes in language use that can be attributed to the widespread use of LLMs is good science. Mixing in such value judgments as "overuse" is not.
While there are serious potential problems with the widespread use of LLMs, increased use of words like "meticulous" and "garner" aren't among them.
lucaspauker•8h ago
Taek•7h ago
The AI emdash is notably AI because most people don't even know how to produce the double long dash on their keyboard, and therefore default to the single dash with spaces method, which keeps their writing as quite visibly human.
adastra22•6h ago
al_borland•6h ago