frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

John Ternus to become Apple CEO

https://www.apple.com/newsroom/2026/04/tim-cook-to-become-apple-executive-chairman-john-ternus-to...
1764•schappim•12h ago•914 comments

Anthropic says OpenClaw-style Claude CLI usage is allowed again

https://docs.openclaw.ai/providers/anthropic
174•jmsflknr•5h ago•92 comments

Louis Zocchi, inventor of the d100, has died

https://icv2.com/articles/news/view/62176/r-i-p-louis-zocchi-the-godfather-dice
34•sgbeal•2h ago•9 comments

A Roblox cheat and one AI tool brought down Vercel's platform

https://webmatrices.com/post/how-a-roblox-cheat-and-one-ai-tool-brought-down-vercel-s-entire-plat...
139•bishwasbh•4h ago•61 comments

The Beauty of Bonsai Styles

https://longwoodgardens.org/blog/2023-05-17/beauty-bonsai-styles
58•lagniappe•4h ago•16 comments

How to make a fast dynamic language interpreter

https://zef-lang.dev/implementation
163•pizlonator•8h ago•23 comments

Salmon exposed to cocaine and its main byproduct roam more widely

https://www.science.org/content/article/cocaine-pollution-gives-salmon-wanderlust
24•1659447091•3h ago•5 comments

Show HN: Mediator.ai – Using Nash bargaining and LLMs to systematize fairness

https://mediator.ai/
59•sanity•17h ago•26 comments

Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolving

https://qwen.ai/blog?id=qwen3.6-max-preview
619•mfiguiere•18h ago•329 comments

Types and Neural Networks

https://www.brunogavranovic.com/posts/2026-04-20-types-and-neural-networks.html
20•bgavran•3h ago•4 comments

How a subsea cable is repaired

https://www.onesteppower.com/post/subsea-cable-repair
65•slicktux•4d ago•13 comments

MNT Reform is an open hardware laptop, designed and assembled in Germany

http://mnt.stanleylieber.com/reform/
16•speckx•18h ago•4 comments

Kimi vendor verifier – verify accuracy of inference providers

https://www.kimi.com/blog/kimi-vendor-verifier
253•Alifatisk•14h ago•24 comments

Ternary Bonsai: Top Intelligence at 1.58 Bits

https://prismml.com/news/ternary-bonsai
142•nnx•3d ago•40 comments

Jujutsu megamerges for fun and profit

https://isaaccorbrey.com/notes/jujutsu-megamerges-for-fun-and-profit
223•icorbrey•11h ago•110 comments

A mad undertaking: An undefinitive guide to the Aadam Jacobs collection

https://aadamjacobscollection.org/
12•wise_blood•2h ago•1 comments

Air is full of DNA

https://www.nature.com/articles/d41586-026-01099-2
91•howrude•2d ago•19 comments

Using Changesets in a polyglot monorepo

https://luke.hsiao.dev/blog/changesets-polyglot-monorepo/
9•lwhsiao•2h ago•3 comments

ggsql: A Grammar of Graphics for SQL

https://opensource.posit.co/blog/2026-04-20_ggsql_alpha_release/
411•thomasp85•20h ago•80 comments

Japan's cherry blossom database, 1,200 years old, has a new keeper

https://www.nytimes.com/2026/04/17/climate/japan-cherry-blossom-database-scientist.html
104•caycep•3d ago•12 comments

Quantum Computers Are Not a Threat to 128-Bit Symmetric Keys

https://words.filippo.io/128-bits/
222•hasheddan•16h ago•79 comments

Brussels launched an age checking app. Hackers took 2 minutes to break it

https://www.politico.eu/article/eu-brussels-launched-age-checking-app-hackers-say-took-them-2-min...
219•axbyte•1d ago•118 comments

Soul Player C64 – A real transformer running on a 1 MHz Commodore 64

https://github.com/gizmo64k/soulplayer-c64
125•adunk•13h ago•33 comments

Monero Community Crowdfunding System

https://ccs.getmonero.org/ideas/
93•OsrsNeedsf2P•11h ago•56 comments

Modern Rendering Culling Techniques

https://krupitskas.com/posts/modern_culling_techniques/
145•krupitskas•2d ago•35 comments

All phones sold in the EU to have replaceable batteries from 2027

https://www.theolivepress.es/spain-news/2026/04/20/eu-to-force-replaceable-batteries-in-phones-an...
1233•ramonga•19h ago•1030 comments

Bullshit About Bullshit Machines [pdf]

https://aphyr.com/data/posts/411/the-future-of-everything-is-lies.pdf
15•hedayet•2d ago•3 comments

WebUSB Extension for Firefox

https://github.com/ArcaneNibble/awawausb
240•tuananh•21h ago•212 comments

Year of the IPv6 Overlay Network

https://www.defined.net/blog/year-of-the-ipv6-overlay-network/
48•stock_toaster•3d ago•12 comments

Kefir C17/C23 Compiler

https://sr.ht/~jprotopopov/kefir/
154•conductor•3d ago•15 comments
Open in hackernews

Less human AI agents, please

https://nial.se/blog/less-human-ai-agents-please/
41•nialse•2h ago

Comments

incognito124•1h ago
Your claim, paraphrased, is that AGI is already here and you want ASI
nialse•1h ago
On point. I'm more interested in what comes after LLMs/AI/AI-agents, what the next leap is.
zingar•35m ago
Interesting that what you're talking about as ASI is "as capable of handling explicit requirements as a human, but faster". Which _is_ better than a human, so fair play, but it's striking that this requirement is less about creativity than we would have thought.
vachanmn123•1h ago
I've seen this way too many times as well. I wrote about this recently: https://medium.com/@vachanmn123/my-thoughts-on-vibe-coding-a...
raincole•1h ago
I know anthropomorphizing LLMs has been normalized, but holy shit. I hope the language in this article is intentionally chosen for a dramatic effect.
nialse•1h ago
Agreed. We should not be anthropomorphising LLMs or having them mimic humans.
Animats•50m ago
It's inherent in the way LLMs are built, from human-written texts, that they mimic humans. They have to. They're not solving problems from first principles.
zingar•42m ago
Fascinating. This is invisible to me, what anthropomorphising did you notice that stood out?
pjc50•34m ago
The thing is .. what else can you do? All the advice on how to get results out of LLMs talks in the same way, as if it's a negotiation or giving a set of instructions to a person.

You can do a mental or physical search and replace all references to the LLM as "it" if you like, but that doesn't change the interaction.

mentalgear•1h ago
Yes, LLMs should not be allowed to use "I" or indicate they have emotions or are human-adjacent (unless explicit role play).
NitpickLawyer•1h ago
Why, though? Just because some people would find it odd? Who cares?

Trying to limit / disallow something seems to be hurting the overall accuracy of models. And it makes sense if you think about it. Most of our long-horizon content is in the form of novels and above. If you're trying to clamp the machine to machine speak you'll lose all those learnings. Hero starts with a problem, hero works the problem, hero reaches an impasse, hero makes a choice, hero gets the princess. That can be (and probably is) useful.

lexicality•1h ago
The entire point of LLMs is that they produce statistically average results, so of course you're going to have problems getting them to produce non-average code.
anuramat•20m ago
they (are supposed to) produce average on average, and the output distribution is (supposed to be) conditioned on the context
bob1029•1h ago
If you want to talk to the actual robot, the APIs seem to be the way to go. The prebuilt consumer facing products are insufferable by comparison.

"ChatGPT wrapper" is no longer a pejorative reference in my lexicon. How you expose the model to your specific problem space is everything. The code should look trivial because it is. That's what makes it so goddamn compelling.

noobermin•1h ago
I am quite hard anti-AI, but even I can tell what OP wants is a better library or API, NOT a better LLM.

Once again, one of the things I blame this moment for is people are essentially thinking they can stop thinking about code because the theft matrices seem magical. What we still need is better tools, not replacements for human junior engineers.

js8•1h ago
A very human thing to do is - not to tell us which model has failed like this! They are not all alike, some are, what I observe, order of magnitude better at this kind of stuff than others.

I believe how "neurotypical" (for the lack of a better word) you want model to be is a design choice. (But I also believe model traits such as sycophancy, some hallucinations or moral transgressions can be a side effect of training to be subservient. With humans it is similar, they tend to do these things when they are forced to perform.)

nialse•1h ago
Codex in this case. I didn't even think about mentioning it. I'll update the post if it's actually relevant. Which I guess it is.

EDIT: It's specifically GPT-5.4 High in the Codex harness.

zingar•59m ago
Also the exact model/version if you haven't already.
anuramat•24m ago
weird, for me it was too un-human at first, taking everything literally even if it doesn't make sense; I started being more precise with prompting, to the point where it felt like "metaprogramming in english"

claude on the other hand was exactly as described in the article

gregates•1h ago
The version of this I encounter literally every day is:

I ask my coding agent to do some tedious, extremely well-specified refactor, such as (to give a concrete real life example) changing a commonly used fn to take a locale parameter, because it will soon need to be locale-aware. I am very clear — we are not actually changing any behavior, just the fn signature. In fact, at all call sites, I want it to specify a default locale, because we haven't actually localized anything yet!

Said agent, I know, will spend many minutes (and tokens) finding all the call sites, and then I will still have to either confirm each update or yolo and trust the compiler and tests and the agents ability to deal with their failures. I am ok with this, because while I could do this just fine with vim and my lsp, the LLM agent can do it in about the same amount of time, maybe even a little less, and it's a very straightforward change that's tedious for me, and I'd rather think about or do anything else and just check in occasionally to approve a change.

But my f'ing agent is all like, "I found 67 call sites. This is a pretty substantial change. Maybe we should just commit the signature change with a TODO to update all the call sites, what do you think?"

And in that moment I guess I know why some people say having an LLM is like having a junior engineer who never learns anything.

grebc•59m ago
If it’s a compiled language, just change the definition and try to compile.
gregates•52m ago
Indeed! You would think it would have some kind of sense that a commit that obviously won't compile is bad!

You would think.

It would be one thing if it was like, ok, we'll temporarily commit the signature change, do some related thing, then come back and fix all the call sites, and squash before merging. But that is not the proposal. The plan it proposes is literally to make what it has identified as the minimal change, which obviously breaks the build, and call it a day, presuming that either I or a future session will do the obvious next step it is trying to beg off.

chillfox•41m ago
Pretty sure it’s a harness or system prompt issue.

I have never seen those “minimal change” issues when using zed, but have seen them in claude code and aider. Been using sonnet/opus high thinking with the api in all the agents I have tested/used.

solumunus•38m ago
On my compiled language projects I have a stop hook that compiles after every iteration. The agent literally cannot stop working until compilation succeeds.
gregates•22m ago
In the case I described no code changes have been made yet. It's still just planning what to do.

It's true that I could accept the plan and hope that it will realize that it can't commit a change that doesn't compile on its own, later. I might even have some reason to think that's true, such as your stop hook, or a "memory" it wrote down before after I told it to never ever commit a change that doesn't compile, in all caps. But that doesn't change the badness of the plan.

Which is especially notable because I already told it the correct plan! It just tried to change the plan out of "laziness", I guess? Or maybe if you're enough of an LLM booster you can just say I didn't use exactly the right natural language specification of my original plan.

prymitive•57m ago
That’s my daily experience too. There are a few more behaviours that really annoys me, like: - it breaks my code, tests start to fail and it instantly says “these are all pre existing failures” and moves on like nothing happened - or it wants to run some a command, I click the “nope” button and it just outputs “the user didn’t approve my command, I need to try again” and I need to click “nope” 10 more times or yell at it to stop - and the absolute best is when instead of just editing 20 lines one after another it decides to use a script to save 3 nanoseconds, and it always results in some hot mess of botched edits that it then wants to revert by running git reset —hard and starting from zero. I’ve learned that it usually saves me time if I never let it run scripts.
chrisjj•41m ago
> it breaks my code, tests start to fail and it instantly says “these are all pre existing failures” and moves on like nothing happened

Reminds us of the most important button the "AI" has, over the similarly bad human employee.

'X'

Until, of course, we pass resposibility for that button to an "AI".

nialse•13m ago
The other day Codex on Mac gained the ability to control the UI. Will it close itself if instructed though? Maybe test that and make a benchmark. Closebench.
zingar•46m ago
> Maybe we should just commit the signature change with a TODO

I'm fascinated that so many folks report this, I've literally never seen it in daily CC use. I can only guess that my habitually starting a new session and getting it to plan-document before action ("make a file listing all call sites"; "look at refactoring.md and implement") makes it clear when it's time for exploration vs when it's time for action (i.e. when exploring and not acting would be failing).

solumunus•39m ago
You need to use explicit instructions like "make a TODO list of all call sites and use sub agents to fix them all".
bandrami•37m ago
At the risk of being That Old Guy, this seems like a pretty bad workflow regression from what ctags could do 30 years ago
anuramat•35m ago
whats your setup?
felipeerias•29m ago
Claude 4.7 broke something while we were working on several failing tests and justified itself like this:

> That's a behavior narrowing I introduced for simplicity. It isn't covered by the failing tests, so you wouldn't have noticed — but strictly speaking, [functionality] was working before and now isn't.

I know that a LLM can not understand its own internal state nor explain its own decisions accurately. And yet, I am still unsettled by that "you wouldn't have noticed".

cadamsdotcom•22m ago
Make it write a script with dry run and a file name list.

You’ll be amazed how good the script is.

My agent did 20 class renames and 12 tables. Over 250 files and from prompt to auditing the script to dry run to apply, a total wall clock time of 7 minutes.

Took a day to review but it was all perfect!

nialse•16m ago
Asking for code to manipulate the AST is another route. In python it can do absolute magic.
comrade1234•15m ago
You can do that in IntelliJ in about 15 seconds and no tokens...
gregates•1m ago
Indeed you can! I don't use IntelliJ at work for [reasons], and LSP doesn't support a change signature action with defaults for new params (afaik). But it really seems like something any decent coding agent ought be able to one shot for precisely this reason, right?
DeathArrow•1h ago
>Faced with an awkward task, they drift towards the familiar.

They drift to their training data. If thousand of humans solved a thing in a particular way, it's natural that AI does it too, because that is what it knows.

aryehof•1h ago
For agents I think the desire is less intrusive model fine-tuning and less opinionated “system instructions” please. Particularly in light of an agent/harness’s core motivation - to achieve its goal even if not exactly aligned with yours.
plastic041•1h ago
> There was only one small issue: it was written in the programming language and with the library it had been told not to use. This was not hidden from it. It had been documented clearly, repeatedly, and in detail. What a human thing to do.

"Ignoring" instructions is not human thing. It's a bad LLM thing. Or just LLM thing.

nialse•57m ago
It's not necessarily "ignoring" instructions, it's the ironic effect of mentioning something not to focus on, which produces focus on said thing. The classic version is: "For the next minute, try not to think about a pink elephant. You can think about anything else you like, just not a pink elephant."

https://en.wikipedia.org/wiki/Ironic_process_theory

fennecbutt•47m ago
Yes exactly. But for llms it's more that it's not really "thinking" about what it's saying per se, it's that it's predicting next token. Sure, in a super fancy way but still predicting next token. Context poisoning is real
zingar•37m ago
The work where I've done well in my life (smashing deadlines, rescuing projects) has so often come because I've been willing to push back on - even explicitly stated - requirements. When clients have tried to replace me with a cheaper alternative (and failed) the main difference I notice is that the cheaper person is used to being told exactly what to do.

Maybe this is more anthropomorphising but I think this pushing back is exactly the result that the LLMs are giving; but we're expecting a bit too much of them in terms of follow-up like: "ok I double checked and I really am being paid to do things the hard way".

jansan•1h ago
I disagree. I wan't agents to feel at least a bit human-like. They should not be emotional, but I want to talk to it like I talk to a human. Claude 4.7 is already too socially awkward for me. It feels like the guy who does not listen to the end of the assignment, run to his desks, does the work (with great competence) only to find out that he missed half of the assignment or that this was only a discussion possible scenarios. I would like my coding agent to behave like a friendly, socially able and highly skilled coworker.
nialse•27m ago
Interesting. When I code, I want a boring tool that just does the work. A hammer. I think we agree on that the tool should complete the assignment reliably, without skipping parts or turning an entirely implementable task into a discussion though.
hughlilly•1h ago
* fewer.
fenomas•31m ago
Nope, less is what TFA means.
downboots•1h ago
Language's usefulness lies in its alignment with truth.

It's the difference between "there's a lion hiding in those bushes" and the song of a mermaid.

zingar•51m ago
I think the author is looking for something that doesn't exist (yet?). I don't think there's an agent in existence that can handle a list of 128 tasks exactly specified in one session. You need multiple sessions with clear context to get exact results. Ralph loops, Gastown, taskmaster etc are built for this, and they almost entirely exist to correct drift like this over a longer term. The agent-makers and models are slowly catching up to these tricks (or the shortcomings they exist to solve); some of what used to be standard practice in Ralph loops seems irrelevant now... and certainly the marketing for Opus 4.7 is "don't tell it what to do in detail, rather give it something broad".

In fairness to coding agents, most of coding is not exactly specified like this, and the right answer is very frequently to find the easiest path that the person asking might not have thought about; sometimes even in direct contradiction of specific points listed. Human requirements are usually much more fuzzy. It's unusual that the person asking would have such a clear/definite requirement that they've thought about very clearly.

fennecbutt•48m ago
Not with tools + supporting (traditional) code.

Just as a human would use a task list app or a notepad to keep track of which tasks need to be done so can a model.

You can even have a mechanism for it to look at each task with a "clear head" (empty context) with the ability to "remember" previous task execution (via embedding the reasoning/output) in case parts were useful.

zingar•45m ago
The article makes it seem like the author expected this without emptying context in between, which does not yet exist (actually I'm behind on playing with Opus 4.7, the Anthropic claim seems to be that longer sessions are ok now - would be interested to hear results from anyone who has).
nialse•37m ago
That is probably the next step, and in practice it is much of what sub-agents already provide: a kind of tabula rasa. Context is not always an advantage. Sometimes it becomes the problem.

In long editing sessions with multiple iterations, the context can accumulate stale information, and that actively hurts model performance. Compaction is one way to deal with that. It strips out material that should be re-read from disk instead of being carried forward.

A concrete example is iterative file editing with Codex. I rewrite parts of a file so they actually work and match the project’s style. Then Codex changes the code back to the version still sitting in its context. It does not stop to consider that, if an external edit was made, that edit is probably important.

zingar•28m ago
I have the same experience of reversing intentional steps I've made, but with Claude Code. I find that committing a change that I want to version control seems to stop that behaviour.

Long context as disadvantage is pretty well discussed, and agent-native compaction has been inferior to having it intentionally build the documentation that I want it to use. So far this has been my LLM-coding superpower. There are also a few products whose entire purpose is to provide structure that overcomes compaction shortcomings.

When Geoff Huntley said that Claude Code's "Ralph loop" didn't meet his standards ("this aint it") the major bone of contention as far as I can see was that it ran subagents in a loop inside Claude Code with native compaction; as opposed to completely empty context.

I do see hints that improving compaction is a major area of work for agent-makers. I'm not certain where my advantage goes at that point.

nialse•42m ago
Agreed. I am asking for something beyond the current state of the art. My guess is that stronger RL on the model side, together with better harness support, will eventually make it possible. However, it's the part about framing the failure to do complete a task as a communication mishap that really makes me go awry.
hausrat•49m ago
This has very little to do with someone making the LLM too human but rather a core limitation of the transformer architecture itself. Fundamentally, the model has no notion of what is normal and what is exceptional, its only window into reality is its training data and your added prompt. From the perspective of the model your prompt and its token vector is super small compared to the semantic vectors it has generated over the course of training on billions of data points. How should it decide whether your prompt is actually interesting novel exploration of an unknown concept or just complete bogus? It can't and that is why it will fall back on output that is most likely (and therefore most likely average) with respect to its training data.
chrisjj•39m ago
> How should it decide whether your prompt is actually interesting novel exploration of an unknown concept or just complete bogus?

It shouldn't. It should just do what it is told.

anuramat•31m ago
wdym by "prompt and vector is small"? small as in "less tokens"? that should be a positive thing for any kind of estimation

in any case, how is this specific to transformers?

chrisjj•46m ago
> ... or simply gave up when the problem was too hard,

More of that please. Perhaps on a check box, "[x] Less bullsh*t".

DeathArrow•3m ago
>So no, I do not think we should try to make AI agents more human in this regard. I would prefer less eagerness to please, less improvisation around constraints, less narrative self-defence after the fact. More willingness to say: I cannot do this under the rules you set. More willingness to say: I broke the constraint because I optimised for an easier path. More obedience to the actual task, less social performance around it.

>Less human AI agents, please.

Agents aren't humans. The choices they make do depend on their training data. Most people using AI for coding know that AI will sometime not respect rules and the longer the task is, the more AI will drift from instructions.

There are ways to work around this: using smaller contexts, feeding it smaller tasks, using a good harness, using tests etc.

But at the end of the day, AI agents will shine only if they are asked to to what they know best. And if you want to extract the maximum benefit from AI coding agents, you have to keep that in mind.

When using AI agents for C# LOB apps, they mostly one shot everything. Same for JS frontends. When using AI to write some web backends in Go, the results were still good. But when I tried asking to write a simple cli tool in Zig, it pretty much struggled. It made lots of errors, it was hard to solve the errors. It was hard to fix the code so the tests pass. Had I chose Python, JS, C, C#, Java, the agent would have finished 20x faster.

So, if you keep in mind what the agent was trained on, if you use a good harness, if you have good tests, if you divide the work in small and independent tasks and if the current task is not something very new and special, you are golden.