frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

How Do You Evaluate Your AI Models?

3•surendersingh•4h ago
How do you run evaluations on your fine-tuned or RL-trained models today? I’m curious about:

Workflow: tools/scripts you rely on for metrics, drift, & other checks.

Headaches: the step that still breaks or slows you down.

Wishlist: if an open-source eval suite existed, what must-have features would land it in your stack?

Real stories (good and ugly) would be super helpful -- thanks in advance for sharing!

Also, please let me know if you'd like to be a very early user of the open-source evals tool we are building. I'll send an invite.

Shipwrecks from John Franklin Doomed Arctic Expedition Where Inuit Said

https://www.smithsonianmag.com/history/the-shipwrecks-from-john-franklins-doomed-arctic-expedition-were-exactly-where-the-inuit-said-they-would-be-180986627/
1•bookofjoe•36s ago•0 comments

Getting Kids Interested in Coding

https://martiancraft.com/blog/2025/05/kids-coding/
1•ingve•2m ago•0 comments

Autoselect the best AI model for any health question using HealthBench scores

https://twitter.com/yaroshipilov/status/1924926821201805625
1•shipilovya•2m ago•1 comments

Cheating Expert Answers Casino Cheating Questions [video]

https://www.youtube.com/watch?v=0QWP4IZOu0I
1•Bluestein•3m ago•0 comments

Developers: Is training taking a back seat?

https://www.techradar.com/pro/developers-is-training-taking-a-back-seat
1•mooreds•5m ago•0 comments

Arch Linux on Mac Pro 1.1/2.1

https://wiki.ponovo.rs/wiki/Linux_on_Mac_Pro_1.1/2.1
1•marxo•5m ago•0 comments

FastAPI and Next.js User Auth

https://www.david-crimi.com/blog/user-auth
1•Crimid01•5m ago•0 comments

Disable debuginfo to improve Rust compile times

https://kobzol.github.io/rust/rustc/2025/05/20/disable-debuginfo-to-improve-rust-compile-times.html
1•ingve•6m ago•0 comments

Things money can't buy – like happiness and better health

https://news.harvard.edu/gazette/story/2025/05/things-money-cant-buy-like-happiness-and-better-health/
2•gnabgib•7m ago•0 comments

System-wide Technology Outage

https://ketteringhealth.org/system-wide-technology-outage/
1•mattbsheets•11m ago•0 comments

Strands Agents, an Open Source AI Agents SDK

https://aws.amazon.com/blogs/opensource/introducing-strands-agents-an-open-source-ai-agents-sdk/
1•jaredwiener•11m ago•0 comments

Show HN: Copilot Audit – PDF->Excel with AI

https://copilot-audit.com/
1•miwend•14m ago•0 comments

Telekons: Some high performance code in Common Lisp

https://github.com/telekons
2•doener•16m ago•0 comments

The Netfarm Suite: a replicated, mostly-trustless object system

https://gitlab.com/cal-coop/netfarm
1•doener•16m ago•0 comments

Text reminders for court hearings can boost justice system efficiency

https://www.route-fifty.com/digital-government/2025/05/report-text-reminders-court-hearings-can-help-boost-justice-system-efficiency/405352/
2•gnabgib•16m ago•0 comments

Our Journey Through Linux/Unix Landscapes

https://blog.kalvad.com/our-journey-through-linux-unix-landscapes/
2•alekq•16m ago•0 comments

Cheers star George Wendt dies at 76

https://www.bbc.com/news/articles/cx2xx998102o
2•austinallegro•17m ago•1 comments

Jason Padgett

https://en.wikipedia.org/wiki/Jason_Padgett
2•sans_souse•18m ago•1 comments

Show HN: I made a tool to repurpose a TikTok video from another account

https://github.com/best-trading-indicator-tools/10XReach
2•Daveatt•19m ago•0 comments

Matrix Governing Board Elections 2025

https://matrix.org/foundation/governing-board-elections/2025/#nominees
1•doener•22m ago•0 comments

Why Good Programmers Use Bad AI

https://nmn.gl/blog/ai-and-programmers
3•namanyayg•22m ago•1 comments

Project Mariner – Browser-Based AI Agent

https://deepmind.google/models/project-mariner/
6•jenthoven•24m ago•0 comments

SynthID Detector – a new portal to help identify AI-generated content

https://blog.google/technology/ai/google-synthid-ai-content-detector/
1•rob•24m ago•0 comments

Feels great to finish CRUD for my ERP app

https://github.com/oitcode/samarium
2•ignosnim•27m ago•1 comments

Intel explores sale of networking and edge unit

https://www.reuters.com/technology/intel-explores-sale-networking-edge-unit-sources-say-2025-05-20/
2•mfiguiere•28m ago•0 comments

Jupiter was formerly twice its current size and had much stronger magnetic field

https://phys.org/news/2025-05-jupiter-current-size-stronger-magnetic.html
2•amichail•29m ago•0 comments

A context-aware LLM agent built directly into Grafana Cloud

https://grafana.com/blog/2025/05/07/llm-grafana-assistant/
5•matryer•31m ago•0 comments

The Onion's Ben Collins Knows How to Save Media

https://www.vanityfair.com/hollywood/story/the-onions-ben-collins-knows-how-to-save-media
3•coloneltcb•31m ago•0 comments

Reachability analysis tool for Linux kernel CVEs

https://github.com/udibabaskydeck/ralk
1•ATechGuy•32m ago•0 comments

Pupil dilation can reveal the accuracy of your memories

https://www.popsci.com/health/eyes-memory-study/
1•geox•34m ago•0 comments