frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Why does it look like LLMs consistently overestimate implementation time?

3•bridgettegraham•3h ago•4 comments

Tell HN: Gemini 3.5 Flash breaks in stupid ways

7•XCSme•4h ago•1 comments

Ask HN: Failing interviews for mid-level SWE in UK, advice please

12•mjb8086•11h ago•6 comments

Debatable but likely not insane: there MAY be an issue with SpaceX' hiring

3•adinhitlore•8h ago•0 comments

Tell HN: I'm tired of AI-generated answers

92•theorchid•5h ago•47 comments

Ask HN: Shouldn't Google need to give a public statement about Railway incident?

173•srameshc•1d ago•103 comments

Valgrind-3.27.1 Is Available

9•paulf38•16h ago•1 comments

Ask HN: Is HN Blocking Mullvad VPN?

3•burger_moon•11h ago•2 comments

Ask HN: Anyone else struggling with AI and work?

7•carlgreene•12h ago•4 comments

Ask HN: Are there any serious efforts to organize tech labor now?

28•0rganize•1d ago•23 comments

Alternatives to HN for "tech outside of AI" discussion?

54•summonerOS•2d ago•35 comments

Ask HN: Are there any social media sites that are AI positive?

6•amichail•15h ago•5 comments

Tell HN: Google banned Railway's account. Everything down

30•sergiotapia•2d ago•18 comments

Can one run AI on source code with the prompt "Find below-avg swear rate files"?

3•pcwir•1d ago•2 comments

Ask HN: How does everyone talk about their work when they've used AI?

5•deku2099•2d ago•9 comments

Ask HN: How to manage AI APIs for SaaS application?

4•sbinnee•2d ago•5 comments

Ask HN: Suggest Google Antigravity Alternative

7•Pallavimdb•1d ago•12 comments

Ask HN: How to make a mono-repo AI-Ready?

2•kasnaka•1d ago•4 comments

Ask HN: Sorry, what Was FiveThirtyEight?

9•gagdiez•1d ago•6 comments

Ask HN: Does root have to be uid 0? Does uid 0 have to be root?

5•axismundi•2d ago•3 comments

Did moving to new place have intended effect?

12•Jeff2Serve•2d ago•12 comments

Ask HN: What are Stainless users doing now that Anthropic has killed it?

5•ubutler•2d ago•3 comments

Do you enjoy reading any type of AI written text?

4•reed1234•22h ago•11 comments

Anthropic is killing stainless, so we built our own SDK/MCP generator

6•iiviie•2d ago•1 comments

Ask HN: Antigravity 2.0 installer breaks existing Antigravity IDEs

3•jdw64•2d ago•1 comments

Ask HN: Is grpcurl home page compromised?

4•jicea•2d ago•0 comments

You've reached the end!

Open in hackernews

Tell HN: Gemini 3.5 Flash breaks in stupid ways

7•XCSme•4h ago
I thought I was going crazy, trying to use Gemini 3.5 Flash to rate some answers, but it kept giving 7 instead of 10 for correct answers.

Apparently once you add a "Grading criteria" text, the model collapses into a "compressed toward the center of the scale" hallucination (or training set overfitting).

Someone on X asked me to try to reproduce it, and I actually got it on the first try on their Gemini Chat:

https://x.com/XCSme/status/2057613611959279988

I am not sure what to make of this (or most SOTA) models. They got a lot smarter with coding and tool usage, but a lot dumber in other ways...

Comments

XCSme•4h ago
Direct link to the chat, ignore the story, it's just some filler tokens: https://gemini.google.com/share/244af1e74841