frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Ask HN: How are you evaluating your LLMs in production?

2•ReDeiPirati•6h ago
Hello HN! Which tools do you use to evaluate your LLMs and agents in production?

Comments

znpy•6h ago
Sysadmin here ("cloud engineer" is what's in my contract).

> Which tools do you use to evaluate your LLMs and agents in production?

None for my work. I still use LLMs from time to time to generate boring terraform code or boring SQL queries, but I'm essentially not going to let some AI bs near the infrastructure I curate.

It's all fun and games until prod is down, or the cloud bill is 10x the previous month's bill (or both).

So unless I can blame it on the AI and take no responsibility I'm not going to let anything AI-powered near production.

Powell confirms that the Fed would have cut by now were it not for tariffs

https://www.cnbc.com/2025/07/01/powell-confirms-that-the-fed-would-have-cut-by-now-were-it-not-for-tariffs.html
1•fedest•5m ago•0 comments

Hilbert's sixth problem: derivation of fluid equations via Boltzmann's theory

https://arxiv.org/abs/2503.01800
1•nsoonhui•20m ago•0 comments

Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Tree Search

https://arxiv.org/abs/2503.04412
1•vrm•21m ago•0 comments

Should you lose your rights with age?

https://news.harvard.edu/gazette/story/2025/07/as-wave-of-dementia-cases-looms-law-school-looks-to-preserve-elders-rights/
2•gnabgib•25m ago•1 comments

Self-hostable AT Protocol backlink index that runs on a RPi 4

https://github.com/at-microcosm/links/tree/main/constellation
1•ffin•26m ago•0 comments

Cross-Device Flows: Security Best Current Practice

https://www.ietf.org/archive/id/draft-ietf-oauth-cross-device-security-10.html
2•mooreds•28m ago•0 comments

The Eiffel Tower is closed to tourists due to searing heat

https://www.cnn.com/2025/07/01/weather/europe-heat-wave-global-warming
1•mooreds•29m ago•0 comments

Dewdrop: A Java Event Sourcing Framework

https://dewdrop.events/
1•mooreds•30m ago•0 comments

You MUST Listen to RFC 2119

https://ericwbailey.website/published/you-must-listen-to-rfc-2119/
4•zdw•31m ago•0 comments

Show HN: Conduit – Turn large text files into listenable audio

https://conduit-landing-page-git-master-tobys-projects-a638df7e.vercel.app/
2•tboneskibs•33m ago•0 comments

Show HN: I built a procedural universe in Python to explore simulation theory

https://github.com/SurceBeats/Atlas
1•SurceBeats•39m ago•1 comments

Trump threatens Tesla, SpaceX support

https://www.reuters.com/business/autos-transportation/elon-musk-renews-criticism-trump-spending-bill-calls-new-political-party-2025-06-30/
8•geox•40m ago•3 comments

Qantas says 6M customers caught up in cyberattack

https://www.afr.com/companies/transport/qantas-says-6-million-aussies-caught-up-in-cyberattack-20250702-p5mbup
3•sen•40m ago•2 comments

Visual intuitive tool to design predict and optimise complex economic models

https://machinations.io
1•leetrout•40m ago•0 comments

iPhone Satellite Functionality Saves Denver Mountaineer

https://www.macrumors.com/2025/07/01/iphone-satellite-denver-climber/
1•alwillis•46m ago•1 comments

Meta bans two anti-Zionist comedians from Instagram

https://mondoweiss.net/2025/07/metas-banning-of-two-anti-zionist-comedians-from-instagram-is-the-latest-example-of-big-techs-deep-anti-palestinian-bias/
7•siltcakes•50m ago•0 comments

Australians to face age checks from search engines

https://ia.acs.org.au/article/2025/australians-to-face-age-checks-from-search-engines.html
8•stubish•52m ago•1 comments

Cursor for the first time today. It was perfect until

https://medium.com/@tahaymerghani/cursor-fixed-everything-until-it-didnt-1e8c20a8f30b
1•taha_moji•53m ago•0 comments

Proximity to Golf Courses and Risk of Parkinson Disease

https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2833716
1•pseudolus•53m ago•0 comments

AI: Great Expecations (1988) [pdf]

https://people.csail.mit.edu/brooks/idocs/AI_hype_1988.pdf
1•Rexxar•58m ago•0 comments

European consumers are mostly saying 'non' to trading in their old phones

https://www.theregister.com/2025/06/18/used_phones_europe/
2•PaulHoule•59m ago•0 comments

Homes Are Taking Longer to Sell in US Markets That Once Flourished

https://www.bloomberg.com/news/articles/2025-07-01/homes-now-harder-to-sell-in-florida-us-south-real-estate-markets
3•JumpCrisscross•1h ago•2 comments

The Tale of the Tribe

https://www.redfin.com/news/the-tale-of-the-tribe/
1•toomuchtodo•1h ago•0 comments

We've Issued Our First IP Address Certificate

https://letsencrypt.org/2025/07/01/issuing-our-first-ip-address-certificate/
31•soheilpro•1h ago•2 comments

Show HN: Procedurally generated 3DGS splats powered by Spark

https://github.com/splatmesh/splatmesh.github.io
3•splatmesh•1h ago•0 comments

Show HN: Just a Line: Resurrected

https://github.com/hbmartin/justaline-ios-resurrected
2•hmartin•1h ago•1 comments

'new stars' have exploded into the night sky – both visible to the naked eye

https://www.livescience.com/space/astronomy/2-new-stars-have-exploded-into-the-night-sky-in-recent-weeks-and-both-are-visible-to-the-naked-eye
7•ricksunny•1h ago•0 comments

Bandersnatch, Bailiffs and the Battle for a Hit Game (1984) [video]

https://www.youtube.com/watch?v=buuUZFh_pyk
1•petethomas•1h ago•0 comments

Omgwtf Trim

https://manuals.bitbuilt.net/guide/1?OMGWTF%20Trim
1•0xC0ncord•1h ago•0 comments

Show HN: I made a Mac OS app at 17yo to turn voice emails

https://www.speechly.io/
1•Rafael_glbrt•1h ago•1 comments