frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Precision-Based Sampling of LLM Judges

https://www.sunnybak.net/blog/precision-based-sampling
1•sunny-bak•1d ago
Built a system that automatically determines how many LLM-as-a-judge runs you need for statistically reliable scores.

Key insight: treat each LLM evaluation as a noisy sample, then use confidence intervals to decide when to stop sampling. The math shows reliability is surprisingly cheap (95%→99% confidence only costs 1.7x more), but precision is expensive (doubling scale granularity costs 4x more).

Also implemented "mixed-expert sampling" - rotating through multiple models (GPT-4, Claude, etc.) in the same batch for better robustness.

Analyzed how latency, cost and reliability scale in this approach.

Typical result: need 5-20 samples instead of guessing. Especially useful for AI safety evals and model comparisons where reliability matters.

Code: https://github.com/sunnybak/precision-based-sampling Blog: https://www.sunnybak.net/blog/precision-based-sampling

Show HN: Automate complex and time consuming searching task

https://seeknwander.com?v=Yb1
1•Xlexander•41s ago•0 comments

Sorting Algorithms Explained with Dance

https://www.youtube.com/playlist?list=PLgkc0hLaMpuYafo094ymkF6JZh3aTHpGD
1•pyuser583•9m ago•0 comments

How do you manage, utilize knowledge with AI?

1•BockErica542•9m ago•0 comments

Quantitative Poems

https://www.aru.ai/books/quantitative-poems/
1•aru•13m ago•0 comments

Reverse engineering of Linear's sync engine

https://github.com/wzhudev/reverse-linear-sync-engine
1•flashblaze•15m ago•0 comments

"We're Cooked"

https://old.reddit.com/r/singularity/comments/1ky1r6z/were_cooked_zerocost_ai_demo/
2•Animats•15m ago•0 comments

Czech Republic says China behind cyberattack on ministry

https://www.reuters.com/world/china/czech-republic-says-china-was-behind-cyberattack-ministry-summons-ambassador-2025-05-28/
2•perihelions•17m ago•0 comments

NeuroSpace – AI employees for business that work like humans

1•NeuroSpace•18m ago•0 comments

Show HN: Typed-FFmpeg 3.0–Typed Interface to FFmpeg and Visual Filter Editor

https://github.com/livingbio/typed-ffmpeg
13•lucemia51•21m ago•1 comments

Alpine village is largely destroyed when a Swiss glacier collapses

https://apnews.com/article/switzerland-alps-blatten-evacuation-landslide-28e15e240eacb40bcbdb6a8f15d49398
3•palmfacehn•24m ago•0 comments

Four Days in May: The India-Pakistan Crisis of 2025

https://www.stimson.org/2025/four-days-in-may-the-india-pakistan-crisis-of-2025/
2•TMWNN•24m ago•0 comments

Fiber optic drones used in Ukraine

https://www.bbc.com/news/articles/ckgn47e5qyno
1•RyanShook•25m ago•0 comments

Show HN: CalBot – The Fastest Executive Assistant

https://calbotservice.com/beta
1•shirschfield•29m ago•0 comments

Go may require prefaulting MMAP

https://flak.tedunangst.com/post/go-may-require-prefaulting-mmap
2•r4um•31m ago•0 comments

Train Tracker Devlog 02

https://twocentstudios.com/2025/05/29/train-tracker-devlog-02/
1•twocentstudios•36m ago•0 comments

I accidentally built a vector database using video compression

https://github.com/Olow304/memvid
3•saleban1031•37m ago•1 comments

Big beautiful bill: software development included as R&e expenditure

https://www.crowell.com/en/insights/client-alerts/house-committee-passes-part-of-big-beautiful-bill-containing-noteworthy-improvements-to-research-and-development-incentives-for-companies
1•useless_eater•38m ago•0 comments

Corporate Memphis

https://en.wikipedia.org/wiki/Corporate_Memphis
1•handfuloflight•39m ago•0 comments

How Do I Evaluate Chunking Strategies for Rags

https://ai.gopubby.com/how-do-i-evaluate-chunking-strategies-for-rags-561cc5f9798b?sk=139e10dd5ba3f8f802be3aa1a7315894
1•thuwarakesh•43m ago•0 comments

Hacking Pinball High Scores

https://gwern.net/blog/2025/pinball-hacking
2•surprisetalk•55m ago•0 comments

Astronomers discover mysterious object firing signals at Earth every 44 minutes

https://www.livescience.com/space/unlike-anything-we-have-seen-before-astronomers-discover-mysterious-object-firing-strange-signals-at-earth-every-44-minutes
3•erickhill•57m ago•1 comments

Show HN: A Real and Proactive MCP Memory Tool

https://github.com/fredcamaral/mcp-memory
1•fredamaral•59m ago•0 comments

Why is quality so rare?

https://linear.app/blog/why-is-quality-so-rare
3•kaushalvivek•1h ago•2 comments

Breaking the Sorting Barrier for Directed Single-Source Shortest Paths

https://arxiv.org/abs/2504.17033
2•mahmoudimus•1h ago•0 comments

U.S. says it will start revoking visas for Chinese students

https://www.cnbc.com/2025/05/29/us-says-it-will-start-revoking-visas-for-chinese-students.html
3•kamaraju•1h ago•0 comments

Ad blockers are part of the problem (2016)

https://www.troyhunt.com/ad-blockers-are-part-of-the-problem/
5•shlomo_z•1h ago•0 comments

Is AI leading to reduced jobs? What it means for software engineers

https://indianexpress.com/article/technology/tech-news-technology/ai-taking-tech-jobs-sofware-engineers-10035331/
2•MarcoDewey•1h ago•0 comments

The hierarchical hypermedia world of Hyper-G

http://oldvcr.blogspot.com/2025/05/prior-art-dept-hierarchical-hypermedia.html
1•classichasclass•1h ago•0 comments

Making maps with noise functions (2022)

https://www.redblobgames.com/maps/terrain-from-noise/
1•benbreen•1h ago•0 comments

Oxfordshire clock still keeping village on time after 500 years

https://www.bbc.com/news/articles/cz70p0qevlro
2•1659447091•1h ago•0 comments