frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Fragments of an Adolescent Web

https://vincent.bernat.ch/en/blog/2026-old-web-articles
1•smitty1e•1m ago•0 comments

Hims and Hers abandons copycat weight-loss drug in face of FDA probe

https://www.ft.com/content/3d4f88e9-33aa-4e1d-81af-ae6954598d63
1•bookofjoe•5m ago•1 comments

Show HN: Claude Code skill that uses Codex as MCP server for code review

https://github.com/pauhu/claude-codex-review
1•pauhu•5m ago•0 comments

The Great Reversal ( OCC and Crypto)

https://www.halogate.io/insights/great-reversal
1•CognitiveBytez•5m ago•1 comments

Show HN: I built a festival tracker that matches lineups to your music library

https://apps.apple.com/us/app/festiveo-music-festivals/id6755355854
1•kirillstyopkin•7m ago•0 comments

Ship Types, Not Docs

https://shiptypes.com/
1•howToTestFE•8m ago•0 comments

RIP Postman free tier. Here's an open-source local-first alternative

https://old.reddit.com/r/webdev/comments/1qyi3wz/rip_postman_free_tier_heres_an_opensource/
1•taubek•11m ago•0 comments

There is no Alignment Problem

1•salacryl•11m ago•0 comments

Hid Remapper

https://github.com/jfedor2/hid-remapper
1•downboots•11m ago•0 comments

Recursive Deductive Verification: A framework for reducing AI hallucinations

1•salacryl•12m ago•0 comments

Bitcoin tumbles below $70K, heavy losses in cryptocurrencies in last three weeks

https://www.bloomberg.com/news/articles/2026-02-05/bitcoin-drops-below-70-000-as-forced-deleverag...
1•heresie-dabord•12m ago•0 comments

Electrobun v1: Build fast, tiny, and cross-platform desktop apps with TypeScript

https://blackboard.sh/blog/electrobun-v1/
3•merlindru•14m ago•1 comments

Why are so many people joining cults? [video]

https://www.youtube.com/watch?v=SfG0PeMS2tQ
1•mgh2•16m ago•0 comments

Apple to Allow ChatGPT, Claude, and Gemini in CarPlay

https://www.macrumors.com/2026/02/06/apple-third-party-chatbots-carplay/
1•geox•16m ago•0 comments

Startup Idea that stops consumers paying the full price

https://shoppyhi.netlify.app
1•daviddahuang•17m ago•0 comments

GitHub Agentic Workflows

https://github.github.io/gh-aw/
1•mooreds•19m ago•0 comments

Exploring hardware-authenticated file encryption in Python

1•Lif28•21m ago•0 comments

Show HN: SEO v3 – Zero-dependency, Simple, powerful PHP SEO library

https://github.com/melbahja/seo
1•exec7•22m ago•0 comments

Show HN: Alerio – Turn Webhooks into Critical VoIP Calls (Overrides Silent Mode)

https://alerio.app/
1•royal-amrah•23m ago•1 comments

A Comprehensive Benchmark for Document Parsing and Evaluation (2025)

https://github.com/opendatalab/OmniDocBench
2•oceansky•24m ago•1 comments

When 20 Watts Beats 20 Megawatts: Rethinking Computer Design

https://smarterarticles.co.uk/when-20-watts-beats-20-megawatts-rethinking-computer-design
1•dxs•27m ago•0 comments

Canadian Province New Brunswick to Quit Using Elon Musk's X

https://www.bloomberg.com/news/articles/2026-02-05/canadian-province-new-brunswick-to-quit-using-...
8•rbanffy•29m ago•1 comments

Heterogeneous Processing: A Strategy for Augmenting Moore's Law (2006)

https://www.linuxjournal.com/article/8368
1•rbanffy•30m ago•0 comments

Show HN: Mvvmm – Firecracker-like mini virtual machine monitor in ~2000 LoC

https://github.com/mistivia/mvvmm
1•mistivia•32m ago•0 comments

Search anything said on a podcast, speaker-labeled and speaker-tracked

https://poddley.com
1•onesandofgrain•33m ago•1 comments

Canada, better the 28th EU member than the 51st US state

https://www.lemonde.fr/en/opinion/article/2026/02/05/canada-better-the-28th-eu-member-than-the-51...
5•u1hcw9nx•33m ago•1 comments

Show HN: Team of agent researchers read things I don't have time to and brief me

https://read-fast.replit.app/
1•thomoliverz•35m ago•2 comments

Show HN: Chaos Agents – Run chaos experiments with Agents

https://github.com/system32-ai/chaos-agents
3•linuxarm64•36m ago•0 comments

Almostnode – Node.js in the Browser

https://github.com/macaly/almostnode
1•ushakov•36m ago•0 comments

Mount Fuji cherry blossom festival canceled due to overtourism

https://www.japantimes.co.jp/news/2026/02/05/japan/japan-mount-fuji-cherry-festival-overtourism/
3•akyuu•38m ago•1 comments
Open in hackernews

High rate of LLM (GPT5) hallucinations in dense stats domains (cricket)

3•sp1982•5mo ago
Disclaimer: I am not a ML researcher, so the terms are informal/wonky. Apologies!

I’m doing a small experiment to see whether models “know when they know” on T20 international cricket scorecards (cricsheet.com for source). The idea is to test models on publicly available data they likely saw during training, and see if they hallucinate or admit they don't know.

Setup: Each question is from a single T20 match. Model must return an answer (numeric or choice from options) or `no_answer`.

Results (N=100 per model):

- gpt-4o-search-preview • Answer rate: 0.96 • Accuracy: 0.88 • Accuracy (answered): 0.91 • Hallucination (answered): 0.09 • Wrong/100: 9

- gpt-5 • Answer rate: 0.35 • Accuracy: 0.27 • Accuracy (answered): 0.77 • Hallucination (answered): 0.23 • Wrong/100: 8

- gpt-4o-mini • Answer rate: 0.37 • Accuracy: 0.14 • Accuracy (answered): 0.38 • Hallucination (answered): 0.62 • Wrong/100: 23

- gpt-5-mini • Answer rate: 0.05 • Accuracy: 0.02 • Accuracy (answered): 0.40 • Hallucination (answered): 0.60 • Wrong/100: 3

Note: most remaining “errors” with search are obscure/disputed cases where public sources disagree.

It seems for domains where models might have seen some data, it’s better to rely on abstention + RAG vs a larger model with more coverage but worse hallucination rate.

Code/Data: https://github.com/jobswithgpt/llmcriceval

Comments

whinvik•5mo ago
Is this exercise done to determine what the model can produce from its training data or is the data shown again to the model?
sp1982•5mo ago
From training data.