frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

https://github.com/kjnez/django-rclone
1•cui•50s ago•0 comments

NY lawmakers proposed statewide data center moratorium

https://www.niagara-gazette.com/news/local_news/ny-lawmakers-proposed-statewide-data-center-morat...
1•geox•2m ago•0 comments

OpenClaw AI chatbots are running amok – these scientists are listening in

https://www.nature.com/articles/d41586-026-00370-w
1•EA-3167•2m ago•0 comments

Show HN: AI agent forgets user preferences every session. This fixes it

https://www.pref0.com/
3•fliellerjulian•4m ago•0 comments

Introduce the Vouch/Denouncement Contribution Model

https://github.com/ghostty-org/ghostty/pull/10559
2•DustinEchoes•6m ago•0 comments

Show HN: SSHcode – Always-On Claude Code/OpenCode over Tailscale and Hetzner

https://github.com/sultanvaliyev/sshcode
1•sultanvaliyev•6m ago•0 comments

Microsoft appointed a quality czar. He has no direct reports and no budget

https://jpcaparas.medium.com/microsoft-appointed-a-quality-czar-he-has-no-direct-reports-and-no-b...
1•RickJWagner•8m ago•0 comments

Multi-agent coordination on Claude Code: 8 production pain points and patterns

https://gist.github.com/sigalovskinick/6cc1cef061f76b7edd198e0ebc863397
1•nikolasi•9m ago•0 comments

Washington Post CEO Will Lewis Steps Down After Stormy Tenure

https://www.nytimes.com/2026/02/07/technology/washington-post-will-lewis.html
3•jbegley•9m ago•0 comments

DevXT – Building the Future with AI That Acts

https://devxt.com
2•superpecmuscles•10m ago•4 comments

A Minimal OpenClaw Built with the OpenCode SDK

https://github.com/CefBoud/MonClaw
1•cefboud•10m ago•0 comments

The silent death of Good Code

https://amit.prasad.me/blog/rip-good-code
2•amitprasad•11m ago•0 comments

The Internal Negotiation You Have When Your Heart Rate Gets Uncomfortable

https://www.vo2maxpro.com/blog/internal-negotiation-heart-rate
1•GoodluckH•12m ago•0 comments

Show HN: Glance – Fast CSV inspection for the terminal (SIMD-accelerated)

https://github.com/AveryClapp/glance
2•AveryClapp•13m ago•0 comments

Busy for the Next Fifty to Sixty Bud

https://pestlemortar.substack.com/p/busy-for-the-next-fifty-to-sixty-had-all-my-money-in-bitcoin-...
1•mithradiumn•14m ago•0 comments

Imperative

https://pestlemortar.substack.com/p/imperative
1•mithradiumn•15m ago•0 comments

Show HN: I decomposed 87 tasks to find where AI agents structurally collapse

https://github.com/XxCotHGxX/Instruction_Entropy
1•XxCotHGxX•19m ago•1 comments

I went back to Linux and it was a mistake

https://www.theverge.com/report/875077/linux-was-a-mistake
3•timpera•20m ago•1 comments

Octrafic – open-source AI-assisted API testing from the CLI

https://github.com/Octrafic/octrafic-cli
1•mbadyl•21m ago•1 comments

US Accuses China of Secret Nuclear Testing

https://www.reuters.com/world/china/trump-has-been-clear-wanting-new-nuclear-arms-control-treaty-...
2•jandrewrogers•22m ago•1 comments

Peacock. A New Programming Language

2•hashhooshy•27m ago•1 comments

A postcard arrived: 'If you're reading this I'm dead, and I really liked you'

https://www.washingtonpost.com/lifestyle/2026/02/07/postcard-death-teacher-glickman/
3•bookofjoe•28m ago•1 comments

What to know about the software selloff

https://www.morningstar.com/markets/what-know-about-software-stock-selloff
2•RickJWagner•32m ago•0 comments

Show HN: Syntux – generative UI for websites, not agents

https://www.getsyntux.com/
3•Goose78•32m ago•0 comments

Microsoft appointed a quality czar. He has no direct reports and no budget

https://jpcaparas.medium.com/ab75cef97954
2•birdculture•33m ago•0 comments

AI overlay that reads anything on your screen (invisible to screen capture)

https://lowlighter.app/
1•andylytic•34m ago•1 comments

Show HN: Seafloor, be up and running with OpenClaw in 20 seconds

https://seafloor.bot/
1•k0mplex•34m ago•0 comments

Tesla turbine-inspired structure generates electricity using compressed air

https://techxplore.com/news/2026-01-tesla-turbine-generates-electricity-compressed.html
2•PaulHoule•36m ago•0 comments

State Department deleting 17 years of tweets (2009-2025); preservation needed

https://www.npr.org/2026/02/07/nx-s1-5704785/state-department-trump-posts-x
5•sleazylice•36m ago•1 comments

Learning to code, or building side projects with AI help, this one's for you

https://codeslick.dev/learn
1•vitorlourenco•36m ago•0 comments
Open in hackernews

High rate of LLM (GPT5) hallucinations in dense stats domains (cricket)

3•sp1982•5mo ago
Disclaimer: I am not a ML researcher, so the terms are informal/wonky. Apologies!

I’m doing a small experiment to see whether models “know when they know” on T20 international cricket scorecards (cricsheet.com for source). The idea is to test models on publicly available data they likely saw during training, and see if they hallucinate or admit they don't know.

Setup: Each question is from a single T20 match. Model must return an answer (numeric or choice from options) or `no_answer`.

Results (N=100 per model):

- gpt-4o-search-preview • Answer rate: 0.96 • Accuracy: 0.88 • Accuracy (answered): 0.91 • Hallucination (answered): 0.09 • Wrong/100: 9

- gpt-5 • Answer rate: 0.35 • Accuracy: 0.27 • Accuracy (answered): 0.77 • Hallucination (answered): 0.23 • Wrong/100: 8

- gpt-4o-mini • Answer rate: 0.37 • Accuracy: 0.14 • Accuracy (answered): 0.38 • Hallucination (answered): 0.62 • Wrong/100: 23

- gpt-5-mini • Answer rate: 0.05 • Accuracy: 0.02 • Accuracy (answered): 0.40 • Hallucination (answered): 0.60 • Wrong/100: 3

Note: most remaining “errors” with search are obscure/disputed cases where public sources disagree.

It seems for domains where models might have seen some data, it’s better to rely on abstention + RAG vs a larger model with more coverage but worse hallucination rate.

Code/Data: https://github.com/jobswithgpt/llmcriceval

Comments

whinvik•5mo ago
Is this exercise done to determine what the model can produce from its training data or is the data shown again to the model?
sp1982•5mo ago
From training data.