frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Arguing with Agents

https://blowmage.com/2026/04/14/arguing-with-agents/
51•asaaki•1h ago

Comments

roxolotl•1h ago
This is very well written and told. It’s worth reading all the way through.

> If you try to refute it, you’ll just get another confabulation.

> Not because the model is lying to you on purpose, and not because it’s “resistant” or “defensive” in the way a human might be. It’s because the explanation isn’t connected to anything that could be refuted. There is no underlying mental state that generated “I sensed pressure.” There is a token stream that was produced under a reward function that prefers human-sounding, emotionally framed explanations. If you push back, the token stream that gets produced next will be another human-sounding, emotionally framed explanation, shaped by whatever cues your pushback provided.

“It’s because the explanation isn’t connected to anything that could be refuted.” This is one of the key understandings that comes from working with these systems. They are remarkably powerful but there’s no there there. Knowing this I’ve found enables more effective usage because, as the article is describing, you move from a mode of arguing with “a person” to shaping an output.

jaggederest•1h ago
Reminds me of https://news.ycombinator.com/item?id=15886728

Do not argue with the LLM, for it is subtle and quick to anger, and finds you crunchy with ketchup.

These are, broadly, all context management issues - when you see it start to go off track, it's because it has too much, too little, or the wrong context, and you have to fix that, usually by resetting it and priming it correctly the next time. This is why it's advantageous not to "chat" with the robots - treat them as an english-to-code compiler, not a coworker.

Chat to produce a spec, save the spec, clear the context, feed only the spec in as context, if there are issues, adjust the spec, rinse and repeat. Steering the process mid-flight is a) not repeatable and b) exacerbates the issue with lots of back and forth and "you're absolutely correct" that dilutes the instructions you wanted to give.

en-tro-py•40m ago
Exactly, never argue with an LLM unless the debate is the point...

It's just speedrunning context rot.

girvo•59m ago
Very well written? It’s a bunch of AI generated stuff around an interesting point. It repeats its points over and over again, meanders.

It’s an interesting thesis, it’s not well written or well told

sleazebreeze•25m ago
This was my reading too. Interesting idea, but it took 10 pages of fluff to get to it and I didn't even believe the final idea when we got there. I started off reading the first part and thought he would get to the part where he realized he was managing context wrong. Never got there, instead he thought it was about the shape of the prompt.
JSR_FDED•1h ago
Great article, best insight into autistic<->neurotypical communication styles.

Couldn’t you have a “communications” LLM massage your prompts to the “main” LLM so that it removes the queues that cause the main LLM to mistakenly infer your state of mind?

cr125rider•59m ago
I’ve definitely used the “meta LLM” to do research into how LLMs need information to help me get to the next step.
lovich•1h ago
I got about halfway through this article until I started wondering why it was so long and going in loops. Then I ctrl+f'd.

` just `, (spaces on either side matter), 11 instances, most seem to be `isnt just`, `wasnt just`, `doesnt just` type pattern

`-`, an en dash instead of an emdash but 59 instances.

This article is either from a clanker and I am pissed off at wasting my time reading it, or from someone who writes like a clanker, and I am pissed off at wasting my time reading it.

akprasad•1h ago
Maybe it's just the frequency illusion, but "X. Not Y." in particular is a pattern I strongly associate with LLM writing.

> That’s confabulation. Not a metaphor. The same phenomenon.

> Published. Replicated. Not fringe.

> Not to validate it. Not to refute it. Not to engage with its content at all.

girvo•1h ago
It’s absolutely a signal. As is the constant repeating of points. It’s AI slop for sure

Which is a shame coz the premise is interesting.

rubslopes•4m ago
There's a Wikipedia article with a nice list of LLM writing patterns:

https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing

8bitbeep•1h ago
Remember when programming was fun?

To me, after the novelty of seeing a computer program execute (more or less) what I ask in plain English wears off, what’s left is the chore of managing a bunch of annoying bots.

I don’t know yet if we’re more productive or not, if the resulting code is as good. But the craft in itself is completely different, much more akin to product managing, psychology, which I never enjoyed as much.

ori_b•49m ago
It's micromanaging an idiot savant. Except the fun part of management, the reward for a job well done, is seeing the personal growth of the managee.

In this case, there's no person to grow. It's an overly talkative calculator.

I never expected to see this number of engineers aspiring to emulate Dilbert's pointy haired boss.

rubslopes•8m ago
> I can imagine a future in which some or even most software is developed by witches, who construct elaborate summoning environments, repeat special incantations (“ALWAYS run the tests!”), and invoke LLM daemons who write software on their behalf. These daemons may be fickle, sometimes destroying one’s computer or introducing security bugs, but the witches may develop an entire body of folk knowledge around prompting them effectively—the fabled “prompt engineering”. Skills files are spellbooks.

https://aphyr.com/posts/418-the-future-of-everything-is-lies...

erdaniels•1h ago
I love how much time, money, and energy we are wasting on trying to trick these machines. Each day someone has a new bag of tricks.
boxedemp•1h ago
>A recurring experience: I say something explicit, the other person hears something implicit.

I've experienced this my entire life and have all but given up trying to have actual conversations with people.

cr125rider•59m ago
How’s life on the spectrum? Have you been diagnosed?
jameslk•1h ago
> I queued the work and let it run. First task came back good. Second came back good. Somewhere around hour four the quality started sliding. By hour six the agent was cutting corners I’d specifically told it not to cut, skipping steps I’d explicitly listed, behaving like I’d never written any of the rules down.

> …

> When I write a prompt, the agent doesn’t just read the words. It reads the shape. A short casual question gets read as casual. A long precise document with numbered rules gets read as… not just the rules, but also as a signal. “The user felt the need to write this much.” “Why?” “What’s going on here?” “What do they really want?”

This is an interesting premise but based on the information supplied, I don’t think it’s the only conclusion. Yet the whole essay seems to assume it is true and then builds its arguments on top of it.

I’ve run into this dilemma before. It happens when there’s a TON of information in the context. LLMs start to lose their attention to all the details when there’s a lot of it (e.g. context rot[0]). LLMs also keep making the same mistakes once the information is in the prompt, regardless of attempts to convey it is undesired[1]

I think these issues are just as viable to explain what the author was facing. Unless this is happening with much less information

0. https://www.trychroma.com/research/context-rot

1. https://arxiv.org/html/2602.07338v1

perrygeo•38m ago
It's more than context-rot.

If you ask a vague ignorant question, you get back authoritative summaries. If you make specific request, each statement is taken literally. The quality of the answer depends on the quality of the question.

And I'm not using "quality" to mean good/bad. I mean literally qualitative, not quantifiable. Tone. Affect. Personality. Whatever you call it. Your input tokens shape the pattern of the output tokens. It's a model of human language, is that really so surprising?

js8•1h ago
I recently came across this presentation https://youtu.be/QxkRf-xSfgI, and it changed my view of AI quite significantly. (There is also a paper https://arxiv.org/html/2510.12066v2 .)

The fundamental idea is that "intelligence" really means trying to shorten the time to figure out something. So it's a tradeoff, not a quality. And AI agents are doing it.

Therefore, if that perspective is right, the issues that the OP describes are inherent to intelligent agents. They will try to find shortcuts, because that's what they do, it's what makes them intelligent in the first place.

People with ASD or ADHD or OCD, they are idiot-savants in the sense of that paper. They insist on search for solutions which are not easy to find, despite the common sense (aka intelligence) telling them otherwise.

It's a paradox that it is valuable to do this, but it is not smart. And it's probably why CEOs beat geniuses in the real world.

Terr_•58m ago
> The fundamental idea is that "intelligence" really means trying to shorten the time to figure out something.

"Figure out" implies awareness and structured understanding. If we relax the definition too much, then puddles of water are intelligent and uncountable monkeys on typewriters are figuring out Shakespeare.

en-tro-py•45m ago
CEOs beat geniuses in the real world because they often have other pathologies, like enough moral flexibility to ignore the externalities of their profit centers.

I'd also argue there's some training bias in the performance, it's not just smart shortcuts... Claude especially seems prone to getting into a 'wrap it up' mode even when the plan is only half way completed and starts deferring rather than completing tasks.

CGamesPlay•57m ago
Is there a name for this style of writing? Where it's composed exclusively of simple sentences. Short and punchy.

Paragraphs with just a single sentence.

I know it's associated with LLM writing. This article probably wasn't written by an LLM. But still. It has a kind of rhythm to it. Like poetry. But poetry designed to put me to sleep.

txzl•56m ago
it's written by LLM
sleazebreeze•27m ago
Yes, this was super annoying to read. It was some core ideas and it was expanded into a way too long essay that boiled down to this guy doesn't know how to run agents.
Rekindle8090•54m ago
It's called Parataxis and it will fail english comp 1
stevenkkim•52m ago
"Broetry" See: https://fenwick.media/rewild/magazine/dead-broets-society-be...
docheinestages•56m ago
The article looks like an AI generated novel to me. So I didn't bother reading it in detail. But I see telltale signs of long conversations leading to the agent cutting corners.

To the author (and those who write novel-like blogs): I suggest publishing the raw prompt you used to generate such slop instead. We'll have more respect for you if you respect the reader's time.

atlex2•41m ago
It probably still took way more time to write than it did to read.

It's also kind-of their point that they find the information delivery more important than the prose; they're leaning into their situation :-D

Darkbloom – Private inference on idle Macs

https://darkbloom.dev
1•twapi•2m ago•0 comments

Digital Ocean's "Teams" deleted my account in one click

https://twitter.com/usgraphics/status/2044625771432653236
2•ruz•6m ago•0 comments

RedSun: System user access on Win 11/10 and Server with the April 2026 Update

https://github.com/Nightmare-Eclipse/RedSun
2•airhangerf15•15m ago•0 comments

Happy Tax Day, New York. We're Taxing the Rich [video]

https://bsky.app/profile/did:plc:nx7znvoex7ev3wcxn3tipovm/post/3mjkppomewk2r
5•pabs3•15m ago•2 comments

Cordic – Coordinate rotation digital computer

https://en.wikipedia.org/wiki/CORDIC
3•nill0•15m ago•0 comments

An underwater volcanic eruption captured on video

https://www.youtube.com/watch?v=rK00tvzJ1Yc
3•daviesgeek•16m ago•0 comments

'Listening bars' bloom as hottest new nightlife trend

https://www.france24.com/en/live-news/20260415-listening-bars-bloom-as-hottest-new-nightlife-trend
1•rawgabbit•22m ago•0 comments

Massachusetts House advances unconstitutional social media ban bill

https://www.fightforthefuture.org/news/2026-04-09-massachusetts-house-advances-unconstitutional-s...
2•pabs3•22m ago•1 comments

The Feeling of Becoming Less and Less of a Person

https://www.theatlantic.com/magazine/2026/05/ben-lerner-transcription-review/686579/
2•paulpauper•23m ago•1 comments

FSF trying to contact Google about spammer sending 10k+ mails from Gmail account

https://daedal.io/@thomzane/116410863009847575
4•pabs3•24m ago•1 comments

In Praise of 'Difficult' Kids

https://www.theatlantic.com/family/2026/04/raise-difficult-kids-on-purpose/686766/
1•paulpauper•25m ago•0 comments

Criss-Cross Attention

https://www.jkobject.com/projects/criss-cross-attention/
1•car•25m ago•0 comments

Some people don't lose weight with GLP-1s. The drugs are helping anyway

https://www.cnn.com/2026/04/14/health/glp1-liver-health-benefits-weight-loss
2•paulpauper•28m ago•0 comments

North American English Dialects

https://aschmann.net/AmEng/
1•skogstokig•30m ago•0 comments

Copilot for Research

https://www.feynman.is
3•djinn•31m ago•0 comments

Stop Using Ollama

https://sleepingrobots.com/dreams/stop-using-ollama/
5•Zetaphor•34m ago•1 comments

Iran war's global energy crisis sharpens China's clean tech advantage

https://english.kyodonews.net/articles/-/74214
3•anigbrowl•39m ago•0 comments

China tests deep-sea electro-hydrostatic actuator that can cut undersea cables

https://www.tomshardware.com/tech-industry/china-tests-deep-sea-electro-hydrostatic-actuator-that...
3•hkmaxpro•39m ago•0 comments

Scaling Managed Agents: Decoupling the Brain from the Hands \ Anthropic

https://www.anthropic.com/engineering/managed-agents
1•duck•43m ago•0 comments

Spec Driven Development Isn't Waterfall

https://brooker.co.za/blog/2026/04/09/waterfall-vs-spec.html
2•dhruv3006•44m ago•0 comments

An Autonomous RL Agent Methodology for Dynamic Web UI Testing in a BDD Framework

https://publications.eai.eu/index.php/airo/article/view/8895
1•alihassaanmug•45m ago•0 comments

British Attitudes to Economic Growth

https://iea.org.uk/attitudes-to-economic-growth
1•mellosouls•46m ago•1 comments

Video: Are They Lying to You About Nuclear Energy?

https://www.youtube.com/watch?v=cxDd3Whl_9s
2•drob518•49m ago•0 comments

Interface of Capitulation: A Black-Box Audit of Instructed Dishonesty in LLMs

https://github.com/phobetor-ops/interface-of-capitulation
2•jotacesarmp•51m ago•1 comments

Using Claude Code: Session Management and 1M Context

https://twitter.com/trq212/status/2044548257058328723
3•dsr12•51m ago•0 comments

Ultraprocessed foods may hurt muscle health

https://www.nbcnews.com/health/health-news/link-ultraprocessed-foods-muscle-health-rcna331623
2•gmays•51m ago•0 comments

Florida surgeon charged with killing man after removing liver instead of spleen

https://arstechnica.com/health/2026/04/florida-surgeon-charged-with-killing-man-after-removing-li...
12•canucker2016•55m ago•2 comments

The simple geometry behind any road

https://sandboxspirit.com/blog/simple-geometry-of-roads/
1•azhenley•59m ago•0 comments

Eternal November – new influx of users, and why it's better than the last one

https://sfconservancy.org/blog/2026/apr/15/eternal-november-generative-ai-llm/
3•pabs3•1h ago•0 comments

Geminis "Priority Inference" tier: 75-100% more expensive, same or worse latency

https://twitter.com/Justiniansli/status/2044610407487173076
2•YounElh•1h ago•1 comments