news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

How are people do AI evals these days?

1•yelmahallawy•1h ago

With the buzz that's happening with all the new AI models that get released (what feels like every other week), how are companies running internal AI evals to determine which model is best for their use case?

Comments

veloryn•32m ago

Lot of teams still seem to rely on ad-hoc eval sets and manual spot checks, especially for domain-specific use cases. The harder problem starts when agents or tool use enter the picture, the evaluation surface expands beyond model output quality to things like tool selection reliability, reasoning loops, cost stability, and cascading failure modes across steps. At that point you’re effectively evaluating system behavior rather than just model accuracy.

Claude PR Code Review costs $15-$25 per review

https://twitter.com/claudeai/status/2031088175456903667

1•artdigital•1m ago•0 comments

German Court Rules TCL QLED Advertising Misleading, Orders Halt

https://www.thelec.net/news/articleView.html?idxno=5692

1•ledoge•2m ago•0 comments

Show HN: I wrote an application to help me speak slower

https://steady.cates.fm/

1•benja123•3m ago•0 comments

AI on a Budget: Recompiling Llama.cpp for Qwen3.5 Inference on an HP Z440

https://jeanbaptistefleury.neocities.org/importance_of_inference_engines

1•DAFtwinTurbo•4m ago•1 comments

JuicyQR – Generate QR codes that never expire, for free

https://juicyqr.com/

1•hamid914•4m ago•0 comments

Self-improving AI (Karpathy pt2)

https://xcancel.com/karpathy/status/2031135152349524125?s=20

1•razodactyl•5m ago•0 comments

ReVision Implant wins FDA breakthrough nod for vision-restoring BCI

https://www.massdevice.com/revision-implant-fda-breakthrough-vision-bci/

1•iancmceachern•8m ago•0 comments

How to send your app code to Figma using Claude Code

https://designexplained.substack.com/p/how-to-send-your-app-code-to-figma

1•kaizenb•10m ago•1 comments

Andrej Karpthy's Agent Hub

https://github.com/karpathy/agenthub/

1•AntoineN2•13m ago•0 comments

AI Gold Trading Bot reinforcement learning system for autonomous XAUUSD trading

https://github.com/forbbiden403/tradingbot

1•solosquad•14m ago•0 comments

Show HN: Isaacus – the legal AI research company

https://isaacus.com/

1•ubutler•15m ago•0 comments

Using Thunderbird for RSS

https://rubenerd.com/using-thunderbird-for-rss/

2•ingve•20m ago•0 comments

Apple M5 Pro and M5 Max CPU Analysis – M5 Max Is Not Much Faster Than the M4 Max

https://www.notebookcheck.net/Apple-M5-Pro-M5-Max-CPU-Analysis-M5-Max-is-not-much-faster-than-the...

1•ezst•24m ago•0 comments

Silicon Valley's new miracle drug [video]

https://www.youtube.com/watch?v=W0ltbBby9FU

1•chirau•25m ago•0 comments

Gone Almost Phishin'

https://ma.tt/2026/03/gone-almost-phishin/

1•jon_bossenger•30m ago•0 comments

What AI Models for War Look Like

https://www.wired.com/story/ai-model-military-use-smack-technologies/

2•jonbaer•34m ago•0 comments

Press-One: Auto-accept every Claude Code prompt

https://github.com/HalfEmptyDrum/press-one

6•EmptyDrum•36m ago•1 comments

Show HN: Envelope – Open-source email API for AI agents (BYO email, MCP)

https://github.com/tymrtn/U1F4E7

1•tmrtn•36m ago•0 comments

Can the Dictionary Keep Up?

https://www.thenation.com/article/culture/stefan-fatsis-dictionary-history/

1•pepys•36m ago•0 comments

Iran War; Pyrrhic Victory

https://waterofmarch.substack.com/p/iran-war-pyrrhic-victory

1•PonDiDancefloor•36m ago•0 comments

This Intel Panther Lake mini PC is as thin as a laptop

https://liliputing.com/this-intel-panther-lake-mini-pc-is-as-thin-as-a-laptop/

1•rezaprima•37m ago•0 comments

Ask HN: What are your favorite books?

2•amunozo•39m ago•2 comments

Emulation of the Drosophila Fly Brain

https://github.com/eonsystemspbc/fly-brain

2•dtj1123•40m ago•0 comments

M5 Max LLM Benchmarks Against M3 Ultra

https://creativestrategies.com/research/m5-max-chiplets-thermals-and-performance-per-watt/

1•abhikul0•40m ago•1 comments

Ask HN: How are you reviewing code at work these days?

1•curiousgal•41m ago•1 comments

His Mother Vanished When He Was 14. 33 Years Later, He Found Her

https://www.nytimes.com/2026/03/09/us/antonio-wiley-missing-mother-found.html

1•emmabotbot•41m ago•0 comments

If Anyone Builds It, Everyone Dies

https://ifanyonebuildsit.com/

4•intenex•42m ago•1 comments

Talking to LLMs Through Nintendo Handhelds

https://substack.com/home/post/p-190447769

1•ghostpeony•42m ago•0 comments

It is recommended to create a forum solely for OpenClaw to post information

https://clawtavern.com

1•WilliamSui•42m ago•1 comments

Investigation complete – Unable to diagnose with current information

https://lapcatsoftware.com/articles/2026/3/2.html

1•7777777phil•44m ago•0 comments