I have RSI so I use voice and LLM to type. Dictate my thoughts, model shapes the sentences. I got lazy about where the line was and automated too much.
After getting unbanned I went through all the comments dang has flagged for LLM posting over the years(https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...) and looked for patterns. Some are obvious, some surprised me:
- curly/typographic quotes (“ ” instead of " ") or even ’ vs ' (that’s is LLM, that's is human)
- humans typing in a browser text box produce straight ASCII. finding curly quotes in a plain HN comment means the text was generated elsewhere and pasted in
- exactly 3 paragraphs of 1-2 sentences each - extremely common LLM output shape
- examples always come in threes - "for example, X, Y, and Z"
- → arrows and — em dashes (sometimes replaced with - en dashes to evade detection)
- overly sycophantic openers - "great point", "this is really interesting" before saying anything
- fake personal framing - "in practice I've found..." immediately followed by a generic claim
Built a detector around these + some heavier signals (TF-IDF cosine similarity across a user's comment history, optional Anthropic/OpenAI LLM pass). You can paste any HN comment URL/ID or just raw text and see what fires
I ran my own banned comments through it. They score 70-85. Sounds about right.
https://hn-bot-detector.vercel.app/
gh: https://github.com/umairnadeem/hn-bot-detector
I wrote this post myself btw
cd4761•1h ago