Additionally, lots of Chinese and Russian keyboard tools use the em dash as well, when they're switching to the alternative (en-US) layout overlay.
There's also the Chinese idiom symbol in UTF8 which gets used as a dot by those users a lot, so that could be a nice indicator for legit human users.
edit: lol @ downvotes. Must have hit a vulnerable spot, huh?
I'm not trying to negate the fact. I'm just pointing out that a correlation without another indicator is not evidence enough that someone is a bot user, especially in the golden age of rebranded DDoS botnets as residential proxy services that everyone seems to start using since ~Q4 2024.
That’s why the analysis was performed over time. All of those em dash sources you mentioned were present before LLM written content became popular.
Bot prevention is a very difficult constant game of cat and mouse, and a lot of bot operators have become very skilled at determining the hidden metrics used by platforms to bless accounts; that's their job, after all. I've become a big fan of lobste.rs' invitation tree approach, where the reputation of new accounts rides on the reputation of older accounts, and risks consequence up the chain. It also creates a very useful graph of account origin, allowing for scorched earth approaches to moderation that would otherwise require a serious (and often one-off) machine learning approach to connect accounts.
Is it possible to differentiate between a bot, and a human using AI to 'improve' the quality of their comment where some of the content might be AI written but not all? I don't think it is.
hm, the whole internet really, youtube, reddit, twitter, facebook, blog posts, food recipes, news articles, it's getting more and more obvious
Unfortunately identify theft now becomes even more damaging.
lets bring back Chrome's WEI while we're at it
/s
And bots reposting a trending post from like 12 years ago to farm internet points... with other bots reposting the top comments of the initial post
I'm more worried about how many people reply to slop and start arguing with it (usually receiving no replies — the slop machine goes to the next thread instead) when they should be flagging and reporting it; this has changed in the last few months.
I'm never suspicious though. One of the strange, and awesome, and incredibly rare things about HN is that I put basically zero stock in who wrote a comment. It's such a minimal part of the UI that it entirely passes me by most of the time. I love that about this site. I don't think I'm particularly unusual in that either; when someone shared a link about the top commenters recently there were quite a few comments about how people don't notice or how they don't recognize the people in the top ranks.
The consequence of this is that a bot could merrily post on here and I'd be absolutely fine not knowing or caring if it was a bot or not. I can judge the content of what the bot is posting and upvote/downvote accordingly. That, in my opinion, is exactly how the internet should work - judge the content of the post, not the character of the poster. If someone posts things I find insightful, interesting, or funny I'll upvote them. It has exactly zero value apart from maybe a little dopamine for a human, and actually zero for a robot, but it makes me feel nice about myself that I showed appreciation.
Brevity is the soul of wit.
Don’t mind me, just skewing the results. — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — results. — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — results. — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — results. — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — results. — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — results. — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — results. — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
Could be an argument made for aggregating by user instead however, if some bots are found to be particularly active and skewing the data.
Why not? I am a descendant of Africans. I am a mildly successful author by tech nerd standards. I was educated in the British Public School tradition, right down to taking Latin in high school and cheering on our Rugby* and Cricket teams.
If someone doesn't want to read my words or employ me because I must be AI, that's their problem. The truth is, they won't like what I have to say any more than they like the way I say it.
I have made my peace with this.
———
Speaking of Rugby, in 1973 another school's Rugby team played ours, and almost the entire school turned out to watch a celebrity on the other school's team.
His name was Andrew, and he is very much in the news today.
I also see AIs use emdashes in places where parentheses, colons, or sentence breaks are simply more appropriate.
Maybe the em dash is the self censorship/deletion mechanism that we've all been waiting for. Better than having to write pill subscription ads, I suppose.
What will/can HN do about it?
If that's worth the cost... probably not?
For now maybe all forums should require some bloody swearing in each comment to at least prove you've got some damn human borne annoyance in you? It might even work against the big players for a little bit, because they have an incentive to have their LLMs not swearing. The monetary reward is after all in sounding professional.
Easy enough for any groups to overcome of course, but at least it'd be amusing for a while. Just watching the swear-farms getting set up in lower paid countries, mistakes being made by the large companies when using the "swearing enabled" models and all that.
(1) I don't recommend focusing disproportionately on one signal. They'll change, and are incredibly easy to optimize for. https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing
(2) I do recommend taking one minute to dash a note off to hn@ycombinator.com if you see suspicious patterns. Dang and our other intrepid mods are preturnatually responsive, and appear to appreciate the extra eyeballs on the problem.
I support this dashing recommendation.
This wasn't really a intended as an "wow, dang is sure sleeping on the job", more than an interesting observation on the new bot ecosystem.
I also feel like there's a missing discussion about the comment quality on HN lately. It feels like it's dropped like crazy. Wanted to see if I could find some hard data to show I haven't gone full Terry Davis.
Obvious AI-generated posts and articles make it to the front page on a daily basis, and I get the impression that neither the average user nor the moderation team see that as a problem at all anymore.
even though I used to like pointing out the difference between a hyphen and a period.
Spaces like HN then become a cacophony of clankers clanking as their numbers increase
Every time someone states they stop reading when they encounter proper typography, I feel attacked.
I present ⸻ the U+2E3B dash.
There is nothing to fear, MY HUMAN FRIEND!
I know there are legitimate usecases for the em-dash, but a few paragraphs (at most) of text in an HN/Reddit comment? Into the trash it goes.
Incidentally, some folks reported my stuff for potential AI generation and I had to respond to the mods about it. So that was kinda funny, if also sad to hear that some folks thought I was a bot.
I’m a dinosaur, not a robot dinosaur. I’m nowhere near that cool, alas.
The tell here is that you used a hyphen, not an em-dash.
This `-` is a hyphen, which I love, even if I'm fairly sure I'm not using it correctly in grammar a lot of the time.
This `--` is an EM-Dash, apparently, which is also what I never use but I also thought was just a hyphen in a different context (incorrect!).
If AI starts use the New Yorker style diaeresis (umlaut-looking thing when there are two vowels in words like coöperate) I swear I'm gonna lose it.
Join me in double-dash em proximates. Shows you manually typed it out with total disregard token count and technical correctness.
Is there any good argument in favor of it, or any other house style quirks for that matter, other than in-group signaling?
I was going to say that I respect it, but find it utterly absurd that they do that. But your comment made me look it up again—I had no idea it was just obsolete/archaïc (except in the New Yorker), I'd thought it was a language feature their 'style' guide had invented.
>this is [summary]
>not just x, it's y
>punchy ending, maybe question
Once you know it's AI it's very obvious they told it to use normal dashes instead of em dashes, type in lowercase, etc., but it's still weirdly formal and formulaic.
For example from https://news.ycombinator.com/threads?id=snowhale
"this is the underreported second-order risk. Micron, Samsung, SK Hynix all allocated HBM capacity based on hyperscaler capex projections. NAND fabs are similarly committed. a 57% reduction in projected OpenAI spend (.4T -> B) doesn't just affect NVIDIA orders -- it ripples into the memory suppliers who shifted capacity to HBM and away from commodity DRAM/NAND. if multiple hyperscalers revise down simultaneously you get a situation similar to the 2019 crypto ASIC overhang: companies tooled up for demand that evaporated. not predicting that, but the purchasing commitments question is real."
I'll actually post a comment or question and I'll get a reply with a bit of a paragraph of what feels like a very "off" (not 'wrong' but strangely vague) summary of the topic ... and then maybe an observation or pointed agenda to push, but almost strangely disconnected from what I said.
One of the challenges is that yeah regular users don't get each other's meaning / don't read well as it is / language barriers. Yet the volume of posts I see where the other user REALLY isn't responding to the other person seems awfully high these days.
I wonder if it is neural networks that are inherently biased, but in blind spots, and that applies to both natural and artificial ones. It may be that to approximate neutrality we or our machines have to leave behind the form of intelligence that depends on intrinsically biased weights and instead depend on logically deriving all values from first principles. I have low confidence that AI's can accomplish that any time soon, and zero confidence that natural intelligence can. And it's difficult to see how first principles regarding human values can be neutral.
I'm also skeptical that succeeding at becoming unbiased is a solution, and that while neutrality may be an epistemic advance, it also degrades social cohesion, and that neutrality looks like rationality, but bias may be Chesterson's Fence and we should be very careful about tearing it down. Maybe it's a blessing that we can't.
Is it ideological?
Is it product marketing in those relevant threads where someone is showcasing?
Or is it pure technical testing, playing around?
Incidentally, how much do they pay for a HN account that is a few years old and accumulated a few thousand Internet points?
Asking for a friend.
Other accounts might be trying to age accounts and dilute their eventual coordinated voting or commenting rings. It's harder to identify sockpuppet accounts when they've been dutifully commenting slop for months before they start astroturfing for the chosen topic.
To reverse the argument - it would be amateurish and plain stupid to ignore it. Barrier to entry is very low. Politics, ads, swaying mildly opinions of some recent clusterfuck by popular megacorp XYZ, just spying on people, you have it all here.
I dont know how dang and crew protects against this, I'd expect some level of success but 100% seems unrealistic. Slow and steady mild infiltration, either by AI bots or humans from GRU and similar orgs who have this literally in their job description.
Oh, would you look at that?
https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
https://news.ycombinator.com/item?id=45322362
> First impression: I need to dive into this hackernews reply mockup thing thoroughly without any fluff or self-promotion. My persona should be ..., energetic with health/tech insights but casual and relatable.
> Looking at the constraints: short, punchy between 50-80 characters total—probably multiple one-sentence paragraphs here to fit that brevity while keeping it engaging.
> User specified avoiding "Hey" or "absolutely."
Lots more in its other comments.
It's so sad to me that good typographical conventions have been co-opted by the zeitgeist of LLMs.
If AI was writing like everyone else we wouldn't be talking about this. But instead it writes like a subset of people write, many of them just some of the time as a conscious effort. An effort that now makes what they write look like lower quality
Say what you want about marketing-isms of your typical LLM, they have been trained and often succeed at making legible, easy to scan blobs of text. I suspect if more LLM spam was curated/touched up, most people would be unable to distinguish it from human discourse. There are already folks commenting on this article discussing other patterns they use to detect or flag bots using LLMs.
(Until a few years ago I probably mostly only saw them in print, and I suppose it just never occurred to me that I liked them in particular vs. just the whole book being professionally typeset generally.)
This is the first time I've ever heard the character ";" referred to as such. It's always been "semi-colon" to me, is this a region/culture difference?
I'm not saying you're wrong, I find it interesting.
i call it a super comma when its separating a list with commas within the sets.
so if i am listing colors like green, blue, red; foods like apple, orange, strawberry; and seasons like winter, summer, fall.
it's one use case for an em-dash, because whatever you have inside it has commas in the phrase.
square and rectangle situation. a supercomma is a subset of semicolon.
Well, I haven't always—just for maybe 20 years.
This wouldn't be an issue if mobile users or Windows users were exercising it too, but it's just Mac owners and LLMs. And Mac owners are probably the minority of instances where it is used.
But anyways, you can't really control how people see your stuff, if you're human I think the humanness will come through anyways, even if you have some particular structure or happen to use em-dashes sometimes. They're so easy to prompt around anyways, that the real tricky LLM stuff to detect by sense and reading is the stuff where the prompter been trying to sneakily make them more human.
https://practicaltypography.com/hyphens-and-dashes.html
I will not allow my good practices to get co-opted as AI "smoke tests".
To turn off Smart Punctuation: Home > Settings > General > Keyboard > Smart Punctuation > Off.
Bye bye em-dash, we had a nice run together.
I might start using that⸻one (a bit long...)
There is no real AI detection tool that works.
When we see something like emd-ashes its simply the average of the used text the models trained on. If you fall into one the averages of a model you basically part of the model ouput. Yikes.
It is also interesting to note that the comparison is between recent comments and recent comments by new users. So, I guess this would take care of the objection that em-dashes (a perfectly fine piece of punctuation) have just been popularized by bots, and now are used more often by humans as well.
Maybe there is a bot problem. Seems almost impossible to fix for a site like this…
What could help is a careful clique hunting algorithm to accurately identify and delete the entire clique.
Of course, all of the above can be replaced by AI, but it would not significantly alter the status quo.
I just hope my writing carries enough voice and perspective that people respond, even if there's an em dash or two.
Our company is being attacked rn in tech media and at least some of it, gut feeling wise, seems obviously sponsored / promoted by competitors. I know that's not surprising, but never watched it happen from this side before.
What we think others around us think has a big effect on our own behavior
Show HN: Hacker News em dash user leaderboard pre-ChatGPT - https://news.ycombinator.com/item?id=45071722 - Aug 2025 (266 comments)
... which I'm proud to say originated here: https://news.ycombinator.com/item?id=45046883.
No one wants to read your ChatGPT outputs.
- Generate age so spamming a product/service is easier and the account appears more trustworthy
- Influence discussions in a particular direction for monetary gain, i.e. "I got rich on bitcoin, you'd be crazy not to invest".
- Influence discussions in a particular direction for political gain, i.e. "I went to Xinjiang and the Uyghurs couldn't be happier!"
word noob new p-value
----------------------------
ai 14.93% 7.87% p=0.00016
actually 12.53% 5.34% p=1.1e-05
code 11.47% 6.04% p=0.00081
real 10.93% 2.95% p=2.6e-08
built 10.93% 2.11% p=2.1e-10
data 8.93% 3.51% p=6.1e-05
tools 7.6% 2.67% p=5.5e-05
agent 7.47% 2.95% p=0.00024
app 7.2% 3.09% p=0.00078
tool 6.8% 1.83% p=8.5e-06
model 6.8% 2.39% p=0.00013
agents 6.67% 2.11% p=5.2e-05
api 6.53% 1.12% p=2.7e-07
building 6.13% 1.54% p=1.3e-05
full 6.0% 1.97% p=0.00017
across 5.87% 1.4% p=1.3e-05
interesting 5.33% 1.54% p=0.00014
answer 5.2% 1.4% p=9.6e-05
simple 4.93% 1.54% p=0.00043
project 4.8% 1.26% p=0.00015You can explore the underlying data using SQL queries in your browser here: https://lite.datasette.io/?url=https%253A%252F%252Fraw.githu... (that's Datasette Lite, my build of the Datasette Python web app that runs in Pyodide in WebAssembly)
Here's a SQL query that shows the users in that data that posted the most comments with at least one em dash - the top ones all look like legitimate accounts to me: https://lite.datasette.io/?url=https%3A%2F%2Fraw.githubuserc...
bediger4000•3h ago
embedding-shape•1h ago
loeg•1h ago
squeefers•1h ago
again with the conspiracy theories
loeg•1h ago
marcher•1h ago
But who knows, maybe even 17 year old accounts are being hijacked by AI now too.
5o1ecist•1h ago
Yeah, right? Not one ever actually turned out to be true!
That conspiracy about billionaires, who supposedly own all of western media, having deliberately created an environment in which anyone who expresses even the remote idea of a conspiracy, gets discreditted, is also not true!
None of them are true!
Not. A. Single. One.
*noms cheese pizza*