frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

LLMs Do Not Grade Essays Like Humans

https://arxiv.org/abs/2603.23714
6•PretzelFisch•1d ago

Comments

ergshankar•1d ago
Interesting point. Do you think this is more about training data limitations or evaluation methods?
nickpsecurity•1d ago
I feel like this is actually human-like but like the average human in the pretraining data. Let's look:

1. They reward short or under-developed essays. I'd say most online content, especially with high upvotes next to the post, fits that. Social media surely does.

2. If it's longer posts, the system starts nitpicking it on minor details, like grammar. We see this even on Hacker News, a community valuing quality, with some longer submissions. It's also a debate tactic to derail opponents' better arguments in many discussions which are in their pretraining data.

3. Essays with more praise get higher scores and with more criticism get lower scores. "Get on the Bandwagon" Effect. Echo chambers. One person writes a thing followed by 5-20 people confirming it. That's probably in the pretraining data. It might survive some filtering/cleaning strategies, too.

So, no, I think these AI's are acting way too human. They need to fine-tune them to act like more, reasonable humans. That will initially take RLHF data for many types of situations. Given pretraining bias, they might also have to train them to drop the bad habits the article mentions.

jerlam•1d ago
School-type long essays only seem to exist in academia. I took a "business communication" class in college and we didn't write essays. My life experience since then has supported the "no essays" conclusion.

A long comment online now means either two things: it's written by a crank who has strong opinions, usually only tangentially related; or someone who has deep knowledge about the subject and has a lot of detail to provide. It's usually the former.

nickpsecurity•1d ago
I agree with you on how their quality is spread out. But, this...

"School-type long essays only seem to exist in academia."

Does an AI know what an essay is? Would it consider any long, descriptive post an essay? Especially if pretraining data has many people describing long posts as essays or "essay-like?" Or only actual essays? And what is an actual essay again?

I think AI's might have different interpretations due to the above questions. They might also conflate essays with longer, detailed, or argumentative posts. We'd have to put a bunch of posts into a bunch of AI's to ask how they classify them.

Prompt intensity threshold effect on AI-generated invention quality (preprint)

https://zenodo.org/records/19347700
1•h_hasegawa•40s ago•0 comments

Causality optional? Testing the "indefinite causal order" superposition

https://arstechnica.com/science/2026/03/getting-formal-about-quantum-mechanics-lack-of-causality/
1•rbanffy•1m ago•0 comments

The Horrors That Could Lie Ahead If Vaccines Vanish

https://projects.propublica.org/childhood-vaccines-deaths-modeling/
1•littlexsparkee•6m ago•0 comments

Does RAG Help AI Coding Tools?

https://www.mikeayles.com/blog/rag-coding-tools/
1•mikeayles•7m ago•1 comments

What Happened to Procomm Plus

https://dfarq.homeip.net/what-happened-to-procomm-plus/
1•zoidb•7m ago•0 comments

HN: AI-native investing app that builds and adapts thematic portfolios to you

https://basketsai.com
1•pranav6226•8m ago•1 comments

The Racket Programming Language

https://www.racket-lang.org/
1•h4ch1•10m ago•0 comments

Golang Constmap by Daniel Lemire

https://twitter.com/lemire/status/2038320406432494059
1•pjf•10m ago•1 comments

What are the best resources to learn about Harness Engineering?

1•udayan_w•11m ago•0 comments

After 16 years and $8B, the military's new GPS software still doesn't work

https://arstechnica.com/space/2026/03/after-16-years-and-8-billion-the-militarys-new-gps-software...
2•rbanffy•11m ago•0 comments

Claude Code's source code has been leaked via a map file in their NPM registry

https://twitter.com/Fried_rice/status/2038894956459290963
18•treexs•14m ago•4 comments

LibreTranslate: Free and Open Source Machine Translation API

https://github.com/LibreTranslate/LibreTranslate
2•ahamez•16m ago•0 comments

David Foster Wallace and the problem of loneliness [video]

https://www.youtube.com/watch?v=FCfpOugmd9E
1•simonebrunozzi•18m ago•0 comments

£5M Funding for supply chain security innovation in UK

https://apply-for-innovation-funding.service.gov.uk/competition/2421/overview/3d6991fa-73b2-48c0-...
4•anonhaven•26m ago•0 comments

Tell HN: DeepL Moving Data to AWS

1•bilekas•27m ago•1 comments

The First Bullshit

https://www.joanwestenberg.com/the-worlds-first-bullshit/
2•viermalbe•27m ago•0 comments

Monitor Claude Code Usage with Grafana

https://braw.dev/blog/2026-03-28-monitor-claude-usage-with-grafana/
2•kisamoto•28m ago•1 comments

Databricks Compromised by TeamPCP

https://www.cryptika.com/teampcp-supply-chain-attack-allegedly-compromised-databricks-platform/
2•debarshri•29m ago•0 comments

Show HN: Stochos – Keyboard driven mouse control

https://github.com/museslabs/stochos
1•ploMP4•31m ago•0 comments

Fast and Gorgeous Erosion Filter

https://blog.runevision.com/2026/03/fast-and-gorgeous-erosion-filter.html
1•runevision•33m ago•1 comments

Tell HN: If your agent can create a PR, it can merge it too

https://github.com/orgs/community/discussions/182732
1•jamesfisher•35m ago•0 comments

The Reed and Pickup – The early internet was a feeling

https://reedandpickup.com/2026/03/30/the-early-internet-was-a.html
1•viermalbe•36m ago•0 comments

Caltech quantum startup Oratomic launches with achieving scaling breakthrough

https://www.oratomic.com/news/launch-announcement
1•chrysander•38m ago•1 comments

Vulniq AI: Autonomous Security Scanner for Any JavaScript/TS Codebase

https://github.com/JakubKontra/skills/blob/main/docs/vulniq.md
1•JakubKontra•41m ago•0 comments

Paris Saint-Germain Names Harvey The Official Legal AI Partner

https://www.harvey.ai/blog/paris-saint-germain-names-harvey-the-official-legal-ai-partner
1•salkahfi•41m ago•0 comments

Germany presents new climate action programme

https://www.argusmedia.com/en/news-and-insights/latest-market-news/2805877-germany-presents-new-c...
2•mariuz•43m ago•0 comments

Show HN: HolyCode – OpenCode in Docker. Use your Claude subscription. 30 tools

https://github.com/CoderLuii/HolyCode
1•CoderLuii•45m ago•0 comments

Closed Source AI = Neofeudalism

https://geohot.github.io//blog/jekyll/update/2026/03/31/free-intelligence.html
4•0x79de•45m ago•0 comments

Show HN: Triplet-Based Parameterization for Characterization of Polynomial Roots

https://zenodo.org/records/19303726
1•ratwolf•45m ago•0 comments

LangDrained: Paths to Your Data Through LangChain

https://www.cyera.com/research/langdrained-3-paths-to-your-data-through-the-worlds-most-popular-a...
1•anonhaven•46m ago•0 comments