frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Better Models: Worse Tools

https://lucumr.pocoo.org/2026/7/4/better-models-worse-tools/
13•leemoore•1h ago

Comments

dofm•14m ago
"Now I’m somewhat worried about the track we’re on here. Alternative tool schemas might not just be unfamiliar. They might be implicitly punished by post-training that optimizes for one particular, forgiving tool ecology."

Only implicitly?

Many decades ago when I was working on research related to using MOOs as a learning environment, you would add "tool calls" into the stream of text that a MOO object might generate, so your rich client would e.g. show a picture, load a web page in a frame, move you on a map, trigger a change in an on-screen representation of an object.

Everyone who tried this ran into more or less the same problems that LLM clients do: any attempt to shoehorn control sequences into in-band content was riddled with security risks, objects accidentally triggering the wrong interface etc.; you could never truly communicate out-of-band.

The more I read about how agentic harnesses work, the less embarrassed I feel about the code twenty-something-year-old me wrote in a MOO client.

mappu•11m ago
In my harness i implemented apply_patch just taking unified diffs for patch -p1. I was shocked to see how bad models are at generating them. I started logging diff failures to analyse -

- All models are terrible at generating line numbers for a proper diff, give up on them

- Some models (Owl-alpha) must have been post-trained on Codex transcripts, because they occasionally push its V4A patch format into any diff tool available

- Codex puts a lot of info in its system prompt about the desired patch style, making larger hunks instead of granular ones, etc

cyanydeez•11m ago
building deterministic tools on non-determinism is hard enough; try adding another layer where your cloud provider decides to massage the context, realigns it's permitted output, arbitrarily downgrades context to cheaper models, or they hire an MBA who determines your plan value can be tied to a degraded model under a new shrinkfied.

It's amazing anyone watched the last 2 decades of tech's enshitification and wants to hook their wagon to this shitshow.

lukasco•2m ago
It sounds like harnesses might have to start to have model by model system prompts, though retrying works, I guess. It reminds me of the ancient times when browsers all read HTML and CSS differently, and differently on different devices. In that sense, this is nothing new. I was going to say, at least we don't have different device types, but then, the model still has to output the right variant of `grep` as well.

Biggest domain seller fears India's fake site crackdown could damage internet

https://www.reuters.com/world/worlds-biggest-domain-seller-fears-indias-fake-site-crackdown-could...
1•1vuio0pswjnm7•36s ago•0 comments

Small, odd, fleeting moments in which a neighborhood briefly exceeds itself

https://www.neighborhood-stills.com/
1•alexandruboia•1m ago•0 comments

The tests are the code now

https://softwaredoug.com/blog/2026/03/10/the-tests-are-the-code-now
1•softwaredoug•3m ago•0 comments

Alibaba/page-agent: in-page GUI agent. Control web interfaces

https://github.com/alibaba/page-agent
1•jonnonz•3m ago•0 comments

I Went Looking for Dignity and found it here [video]

https://www.youtube.com/watch?v=4gFGFbctEe0
1•pshapiro99•5m ago•1 comments

How AI Became More Expensive Than the Workers It Replaced [video]

https://www.youtube.com/watch?v=cfaZZPjA3g0
1•Bender•8m ago•0 comments

Linux DRM Scheduler Patches Yield Improvement for Job Submission Latency

https://www.phoronix.com/news/DRM-Scheduler-Lower-Job-Submit
1•Bender•11m ago•0 comments

Don't Hang Up on AI Scammers. Do This Instead [video]

https://www.youtube.com/watch?v=lk3jCuITwcE
1•wisemanwillhear•11m ago•0 comments

Show HN: Mise – A keyboard-driven Python/Qt6 browser built for fanless laptops

https://github.com/Rakosn1cek/Mise
1•Rakosn1cek•15m ago•0 comments

Exclusive-Meta's Zuckerberg says AI agent tech progressing slower than expected

https://finance.yahoo.com/technology/ai/articles/exclusive-zuckerberg-says-ai-agent-201123441.html
1•_____k•15m ago•0 comments

Show HN: Sieze the means of production from our agentic overlords

https://github.com/Xophmeister/wean
2•Xophmeister•17m ago•0 comments

Show HN: I built an encrypted BLE dongle for pasting stuff to air-gapped devices

https://github.com/Brisk4t/ToothPaste
2•Brisk4t•18m ago•1 comments

Operation Ivy Bells

https://en.wikipedia.org/wiki/Operation_Ivy_Bells
2•m-hodges•20m ago•0 comments

Visualize how many files in a codebase you contributed

https://app.principal-ade.com/anomalyco/opencode
1•fernando-ram•23m ago•0 comments

Early Web Links

https://earlyweblinks.com/
1•bookofjoe•29m ago•0 comments

Arroup – record screen, edit, share with link

https://www.arroup.com/
1•vladsmigelski•32m ago•1 comments

How to build a full body ultrasound [video]

https://www.youtube.com/watch?v=4nzzpUKhj1M
1•Element_•32m ago•0 comments

$85,000 in tokens later: What I learned from scaling agentic coding at Lovable

https://lovable.dev/blog/85000-in-tokens-later-scaling-agentic-coding-at-lovable
2•aliclark•32m ago•1 comments

A Peculiarly Dutch Summer Rite: Children Let Loose in the Night Woods (2019)

https://www.nytimes.com/2019/07/21/world/europe/netherlands-dropping-children.html
1•edward•33m ago•0 comments

One Month of Ecosystem Security Engineering

https://thephp.foundation/blog/2026/06/23/one-month-of-ecosystem-security-engineering/
1•campuscodi•34m ago•0 comments

Show HN: Using Wake-on-LAN for an AI Project

https://guilhermefrj.medium.com/i-built-a-local-chatgpt-killer-on-a-single-rtx-5080-heres-everyth...
1•guilhermef•35m ago•0 comments

The Unique Universe (2009)

https://physicsworld.com/a/the-unique-universe/
3•mellosouls•40m ago•0 comments

GTA 2 ported to JavaScript, with WebRTC P2P multiplayer

https://gta2js.vercel.app/
3•possiblelion•43m ago•1 comments

Babel, a construction that builds and unbuilds itself forever

https://sand-morph.up.railway.app/babel
1•echohive42•43m ago•0 comments

Ask HN: New employeer not providing equipment

7•gl9•49m ago•14 comments

Protocol Prying: Vulnerability Research in AirDrop and Quick Share

https://arxiv.org/abs/2606.26967
2•logickkk1•53m ago•0 comments

Providence AI

https://providenceai.app
1•j_anderssen•54m ago•0 comments

Rare things become common at scale (2014)

https://longform.asmartbear.com/scale-rare/
5•Tomte•54m ago•0 comments

Show HN: Grade your code's post-quantum crypto exposure A–F, free, in-browser

https://throndar.ai/cbom
1•algo26•54m ago•0 comments

Bitpoint: Turn a directory of Python files into HTTP endpoints

https://github.com/tanrax/bitpoint
1•andros•55m ago•0 comments