frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: How do you evaluate a LLM these days?

1•pseudony•1h ago
Hello HN. Recent events and me being Danish (EU) strongly encourage me to reconsider US services like Anthropic's Claude. I mention this to say that the problem of evaluating LLMs suddenly got very necessary for me. While I don't doubt Claude is nearly ideal for my corner of software development, I would like to have a better sense of how much I am giving up.

With that in mind, how do you go about best evaluating LLM's these days, short of going with "gut feel"? My best idea so far is to design/write various small "design a program/library" tasks with clear functional requirements and letting each model try implementing the tasks, probably using Open Code and Open Router as the common components throughout the evaluation.

But this field moves fast and I may well have missed many better or easier approaches. What would you do?

Should You Be 'Fibermaxxing'?

https://www.nytimes.com/2025/07/08/style/fibermaxxing-tiktok-trend.html
1•brandonb•22s ago•0 comments

Crusade against usury reaches Wall Street

https://www.economist.com/finance-and-economics/2026/01/14/donald-trumps-crusade-against-usury-re...
1•andsoitis•28s ago•0 comments

Show HN: Frigatebird – analytical SQL engine built from first principles

https://github.com/Frigatebird-db/frigatebird
1•nottorus•41s ago•0 comments

JSON-Render: AI –> JSON –> UI

https://json-render.dev/
1•michaelmior•2m ago•0 comments

Multiplicity of the Soul: Time Travel

https://atmankalena.substack.com/p/multiplicity-of-the-soul-time-travel
2•Trifectorium•3m ago•1 comments

Invisibility is the maintainer's reward for competence

https://www.joanwestenberg.com/the-rime-of-the-ancient-maintainer/
1•danielfalbo•4m ago•0 comments

Foreigners' data stolen in hack of French immigration agency

https://www.lemonde.fr/en/pixels/article/2026/01/05/foreigners-data-stolen-in-hack-of-french-immi...
1•eurg•6m ago•0 comments

Exposing muscle tissue to blood from Long Covid patients weakens mitochondria

https://iopscience.iop.org/article/10.1088/1758-5090/adf66c
4•brandonb•6m ago•0 comments

'The Technology Is There': Supreme Court Practitioners Embracing AI

https://www.law.com/nationallawjournal/2026/01/15/the-technology-is-there-supreme-court-practitio...
1•hooverlabs•7m ago•0 comments

Show HN: The Friend Zone, a curated list of web games to play with friends

https://friendzone.games/
1•johnsillings•10m ago•1 comments

BAML is a domain-specific language to generate structured outputs from LLMs

https://docs.boundaryml.com/home
2•tosh•12m ago•0 comments

Earth is warming faster. Scientists are closing in on why

https://www.economist.com/science-and-technology/2024/12/16/earth-is-warming-faster-scientists-ar...
2•andsoitis•15m ago•0 comments

It's ridiculously fun to evolve flies to find food

https://claude.ai/public/artifacts/8f39482c-b2c7-4bd6-8d47-41bc7b678b7e
2•logicallee•18m ago•1 comments

Something Happens When You Straighten the Cursor [video]

https://www.youtube.com/watch?v=8cjLa5aOmsM
1•vinhnx•19m ago•0 comments

For me, Hacker News is probably the best community on the internet

8•DenisDolya•22m ago•1 comments

Aionui – Unified desktop workspace for multiple CLI AI agents

https://github.com/iOfficeAI/AionUi
1•testycool•24m ago•0 comments

Show HN: MindFry – A database engine that implements biological memory decay

1•laphilosophia•24m ago•0 comments

Second and Third Order Effects of Vibe Coding

https://www.enterprisevibecode.com/p/second-and-third-order-effects-of-vibe-coding
1•mlady•25m ago•1 comments

Show HN: Differentiable Quantum Chemistry

https://github.com/lowdanie/hartree-fock-solver
1•lowdanie•27m ago•0 comments

Trump's 'Free Speech' Presidency Has Racked Up 200 Censorship Attempts

https://www.techdirt.com/2026/01/16/trumps-free-speech-presidency-racked-up-200-censorship-attemp...
5•HotGarbage•27m ago•2 comments

Writers and Their Day Jobs

https://lithub.com/the-work-behind-the-writing-on-writers-and-their-day-jobs/
1•keiferski•28m ago•0 comments

OpenAI will start testing ads in ChatGPT free and Go tiers

https://twitter.com/OpenAI/status/2012223373489614951
2•jcfrei•28m ago•0 comments

Show HN: HORenderer3: A C++ software renderer implementing OpenGL 3.3 pipeline

https://github.com/Hobanghann/HORenderer3
2•zghdls•28m ago•0 comments

Dutch parcel deliverers (PostNL, DHL) shamed online raises privacy concerns

https://nltimes.nl/2026/01/16/dutch-parcel-deliverers-publicly-shamed-online-raising-privacy-conc...
3•giuliomagnifico•29m ago•0 comments

Looking for cofounder(s) in next‑gen memory (theory-driven and materials)

1•feynman_quantum•29m ago•0 comments

The Cruel, Conceited Follies of Trump's Foreign Policy: 2026 Edition

https://notesonliberty.com/2026/01/15/the-cruel-conceited-follies-of-trumps-foreign-policy-2026-e...
4•brandonlc•31m ago•0 comments

I'm building the finviz of prediction markers

https://www.polyviz.io/
1•mattmerrick•31m ago•0 comments

Show HN: What if your menu bar was a keyboard-controlled command center?

https://extrabar.app/
12•pugdogdev•31m ago•0 comments

Ask HN: Feedback on an open cognitive framework (design and structure)

1•DELTA-X•32m ago•0 comments

Independent fellowships for post‑PhD next‑gen memory research

1•feynman_quantum•32m ago•0 comments