frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Llama 3.2 3B and Keiro Research achieves 85% on SimpleQA

https://www.keirolabs.cloud/benchmarks
6•mannybruv•1h ago
ran this over the weekend. stack was Llama 3.2 3B running locally + Keiro Research API for retrieval.

85.0% on 4,326 questions. where that lands:

ROMA (357B): 93.9% OpenDeepSearch (671B): 88.3% Sonar Pro: 85.8% Llama 3.2 3B + Keiro: 85.0%

the systems ahead of us are running models 100-200x larger. that's why they're ahead. not better retrieval, not better prompting — just way more parameters.

the interesting part is how small the gap is despite that. 3 points behind a 671B model. 0.8 behind Sonar Pro. at some point you have to ask what you're actually buying with all that compute for this class of task.

Want to know how low the reader model can go before it starts mattering. in this setup it clearly wasn't the limiting factor and also if smaller models with web enabled will perform as good( if not better) as larger models for a lot of non coding tasks

Full benchmark script + results --> https://github.com/h-a-r-s-h-s-r-a-h/benchmark

Keiro research -- https://www.keirolabs.cloud/docs/api-reference/research

Comments

harshRust•1h ago
3B model competing with 300B+ systems is kinda insane. Really cool work. Love seeing smart retrieval beat brute-force scaling.

Show HN: ScreenStack – AI-native platform purpose-built for technical interviews

https://screenstack.tech/
1•ud0•18s ago•0 comments

Show HN: Elia – Governed hybrid architecture (LLM is capability, not authority)

https://github.com/Jmc-arch/elia-governed-hybrid-architecture
1•JMC-FR•2m ago•0 comments

Show HN: How the IP Leasing Market Fakes Legitimacy

1•xunairah•3m ago•0 comments

Graeco-Arabic translation movement

https://en.wikipedia.org/wiki/Graeco-Arabic_translation_movement
1•teleforce•3m ago•0 comments

Boy I was wrong about the Fediverse

https://matduggan.com/boy-i-was-wrong-about-the-fediverse/
1•wrxd•4m ago•0 comments

Ask HN: In the age of AI, how are you marketing your products to differentiate?

1•Gooblebrai•7m ago•0 comments

Show HN: GeoDirect – Create universal map links that work everywhere

https://geodirect.io
1•pascalveze•8m ago•0 comments

System Design and ML Interview Material

https://github.com/Ali-Meh619/System_Design_Principles
2•alimeh•9m ago•1 comments

System Design and Machine Learning Interview Material

1•alimeh•10m ago•0 comments

While Hinkley Nuclear Was Being Built, the UK Grid Decarbonized

https://cleantechnica.com/2026/03/06/while-hinkley-nuclear-was-being-built-the-uk-grid-decarbonized/
1•toomuchtodo•15m ago•1 comments

Simple Maturin Based Python Bindings to Scryer Prolog

https://github.com/philzook58/scryerpy
1•triska•23m ago•0 comments

OpenChaos: Strangers vote on what code ships to production (2 months in)

https://blog.openchaos.dev/posts/weeks-8-and-9-the-bot-only-listened-to-its-master
1•skridlevsky•23m ago•0 comments

Show HN: CV10X – AI resume builder that remembers your profile

https://www.cv10x.com
1•ennemlimuhssin•24m ago•0 comments

Writing a simple VM in less than 125 lines of C (2021)

https://www.andreinc.net/2021/12/01/writing-a-simple-vm-in-less-than-125-lines-of-c/
2•birdculture•25m ago•0 comments

Ask HN: Doctor with software development experience – careers combining both?

2•frank-cheynne•27m ago•0 comments

Uploading Pirated Books via BitTorrent Qualifies as Fair Use, Meta Argues

https://torrentfreak.com/uploading-pirated-books-via-bittorrent-qualifies-as-fair-use-meta/
1•askl•31m ago•0 comments

Show HN: Spectra – local finance dashboard with offline ML categorization

https://www.withspectra.app/
1•francesco_gab•32m ago•0 comments

Cloudflare-Native Starter Kits

https://greeff.dev/starter-kits
1•pio_greeff•34m ago•0 comments

Show HN: Pre-Launch – $15/Mo Status Page (Vs Atlassian $299) – Join Waitlist

2•Powellfgn•41m ago•0 comments

Hetzner bans website for 'violating terms'

https://twitter.com/tyleraloevera/status/2030064144980873434
1•timedude•44m ago•1 comments

Show HN: µJS, a 5KB alternative to Htmx and Turbo with zero dependencies

https://mujs.org
2•amaury_bouchard•50m ago•0 comments

NASA's Dart Mission Changed Orbit of Asteroid Around Sun

https://www.jpl.nasa.gov/news/nasas-dart-mission-changed-orbit-of-asteroid-didymos-around-sun/
2•merksittich•51m ago•0 comments

How to Untwist Your Fractions

https://mathvoices.ams.org/featurecolumn/2026/03/01/how-to-untwist-your-fractions/
1•uamuamuam•53m ago•0 comments

The Internals of PostgreSQL

https://www.interdb.jp/pg/
2•BinaryIgor•55m ago•0 comments

QGIS 4.0

https://changelog.qgis.org/en/version/4.0/
28•jonbaer•56m ago•0 comments

Microsoft is the carbon removal market

https://www.latitudemedia.com/news/microsoft-is-the-carbon-removal-market/
2•PaulHoule•57m ago•0 comments

Show HN: RAM Fear Greed Index

https://pcindex.app/
2•flordaman•1h ago•0 comments

I built a structured system design interview prep roadmap with progress tracking

1•shalhan•1h ago•0 comments

Show HN: Qarapace – GCP IAM reviews with persistent decisions and audit trails

https://qarapace.com/
1•gjanvier•1h ago•0 comments

Are we still ignoring cheating candidates?

1•shashahchk•1h ago•4 comments