Where Do AI Coding Agents Fail?

2•kioku•1w ago

Comments

stevefan1999•1w ago

Already kind of failing me at doing thing in the best practices of Rust. Coding agents, even Opus 4.5 I'm using, tends to split into function when you can put that under an impl Into or TryInto, or just put that into impl member, not `pub fn do_this(user: &User)` but instead `User::do_this(&self)`

I always have to tell the agent to use functional iterators and itertools all the time but it still prefers to use primitive for loop and push into a mutable array. Not that it doesn't get the thing done but please when you can iter collect why can't you use them

It also lacks the ability to use high level data structure such as bit vector and matrices. I doubt they could use even harder stuff such as B+ tree, red black tree or Fibonacci heap...

In my experimentation in building vibewasm, a wasm engine using a binding that I wrote the low layer of sljit manually, and I instructed Opus 4.5 to build a wasm engine out of it, and take in designs from other wasm engine that is based on sljit but in C or C++...pwart and walrus to be specific

I have a specific case on the use of bit vector, at least for Opus, it always tend to use hash/btree set for this.

I have to explicitly explain and tell Opus that the way you are marking the register can be represented as a bit vector, because the number of registers are bounded (15) and bit vectors will have an ultimate space save. But a next refactor attempt to rewrite the SSA layer into RTL (register transfer language that targets sljit), the same mistake happened again. It turns out my prompt of following previous design didn't work, or Opus simply just couldn't justify that bit vector is the best despite user requirements.

I have to revert that change and do it myself.

kioku•1w ago

Sounds like your experience matches what's being described in the paper. Even if we get correct code, it might not carry design intent.

cbyteai•1w ago

Not surprising that AI PRs fail when they touch lots of files or try to implement features nobody asked for. Human context still matters a lot.

EVs Are a Failed Experiment

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

CCC (Claude's C Compiler) on Compiler Explorer

Homeland Security Spying on Reddit Users

Actors with Tokio (2021)

Can graph neural networks for biology realistically run on edge devices?

Deeper into the shareing of one air conditioner for 2 rooms

Weatherman introduces fruit-based authentication system to combat deep fakes

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

A Curated List of ML System Design Case Studies

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

Open Problems in Mechanistic Interpretability

Bye Bye Humanity: The Potential AMOC Collapse

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

Digital Iris [video]

Essential CDN: The CDN that lets you do more than JavaScript

They Hijacked Our Tech [video]

Vouch

HRL Labs in Malibu laying off 1/3 of their workforce

Show HN: High-performance bidirectional list for React, React Native, and Vue

Show HN: I built a Mac screen recorder Recap.Studio

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

Vectors and HNSW for Dummies

Sanskrit AI beats CleanRL SOTA by 125%

'Washington Post' CEO resigns after going AWOL during job cuts

Claude Opus 4.6 Fast Mode: 2.5× faster, ~6× more expensive

TSMC to produce 3-nanometer chips in Japan

Quantization-Aware Distillation

List of Musical Genres