The real annoying thing about Opus 4.5 is that it's impossible to tell most people "Opus 4.5 is an order of magnitude better than coding LLMs released just months before it" without sounding like a AI hype booster clickbaiting, but it's the counterintuitive truth. To my continual personal frustration.
To my continual personal frustration.
That's not the fault of Opus 4.5 because like all AI nonsense it's still not worth the cost. The privacy given up by having to authenticate with services like Github that used to be publicly available before getting constantly DDoSed by AI bots. The reliability and freedom that evaporated into the ether as folks run to the shelter of Cloudflare to mitigate the endless DDoS attacks at the hands of AI data scrapers. The emotional and social development stunted by having AI chatbots pretend to be a significant other and only say what folks want to hear. Whether Opus "can" code is immaterial.It's good enough for things I can define well and write okay code for.
But it is far from perfect.
It does too much, like any LLM. For example, I had some test cases for deleted methods, and I was being lazy and didn't want to read a huge test file, so I asked it to fix it.
It did. Tests were green because it mocked non-existing methods, while it should have just deleted the test cases as they were no longer needed.
Luckily, I read the code it produced.
The same thing happened with a bit of decorators I asked it to write in Python. It produced working code, tests were fine, but I reworked the code manually to 1/10 of the size proposed by Opus.
It seems magical, even thinking, but like all LLMs, it is not. It is just a trap.
When LLMs try to do the wrong thing, don't correct it with new instruction. Instead, edit your last prompt and give more details there.
LLM have limited context length, and they love stuck to their previous error. Just edit the previous prompt. Don't let the failed attempt pollute your context.
And code size thing is not fixed by better prompt.
It also likes to even ignore reasonable plan it writen itself just to add more code.
yeah it writes so much code its crazy - where it can be solved, like you mentioned, with 1/10th
I mean they are in the token business, so this is expected to continue as long as they possibly can as long as they are a bit better than competition.
This is what 99% of devs that praise Claude Code don't notice. The real productivity gains are much lower than 10x.
Maybe they are like 2x tops.
The real gains is that you can be lazy now.
In reality most tasks you do with LLM (not talking about greenfield projects, those are vanity metrics) can be completed by human in mostly same time with 1/10th of code - but the catch here is you need to actually think and work instead of talking to chat or watching YouTube while prompt is running, which becomes 100x harder after you use LLM extensively for a week or so.
The problem is that these increases in model performance are like the boy who cried wolf. There's only so many times you can say "this model is so much better, and does X/Y/Z more/less" and have it _still_ not be good enough for general use.
From my experience Opus is only good at writing Rust. But it's great at something like TS because the amount of code it has been trained on is probably orders of magnitude bigger for the latter language.
I still use Codex high/xhigh for planning and once the plan is sound I give it to Opus (also planning). That plan I feed back to Codex for sign-off. It takes an average additional 1-2 rounds of this before Opus makes a plan that Codex says _really_ ticks all the boxes of the plan it made itself and which we gave to Opus to start with ...
That tells you something.
Also when Opus is "done" and claims so I let Codex check. Usually it has skipped the last 20% (stubs/todos/logic bugs) so Codex makes a fixup plan that then again goes to through the Codex<->Opus loop of back and forth 2-3 rounds before Codex gives the thumbs up. Only after that has Opus managed to do what the inital plan said that Codex made in the first place.
When I have Opus write TS code (or Python) I do not have to jump through those hoops. Sometimes one round of back and forth is needed but never three, as with Rust.
Interesting. I thought C++ interop was one of the top priorities right now.
It’s one of the top items mentioned in recent language progress reports, the Rust foundation received a million dollar grant to work on it, and there was a talk at the most recent RustConf about how Google is doing Rust/C++ interop.
Curious to know what discussions led to that conclusion.
The latest on this project is in this update - https://github.com/rust-lang/rust-project-goals/issues/388. They’re working on it, but no progress yet.
I think the author wanted something now, which is a completely acceptable reason to start his own project.
I don't know if they later changed their minds. From the meetings notes it seemed they didn't want implement a C++ frontend in rustc.
How do you handle function lifetimes then? Those are generally non-local to infer, and Rust requires annotating functions with informations for that. I tried taking a look at the mako db's refactor but I didn't see any lifetime annotation being added there.
Functions need annotations like "return value lives as long as argument 1" or "return value lives as long as both arguments are alive"
It does, but it's under the "External Annotations" section:
// External annotations go in a header file
// @external: {
// strlen: [safe, (const char* str) -> owned]
// strcpy: [unsafe, (char* dest, const char* src) -> char*]
// strchr: [safe, (const char* str, int c) -> const char* where str: 'a, return: 'a]
//
// // Third-party libraries work the same way
// sqlite3_column_text: [safe, (sqlite3_stmt* stmt, int col) -> const char* where stmt: 'a, return: 'a]
// nlohmann::json::parse: [safe, (const string& s) -> owned json]
// }
> The where clause specifies lifetime relationships—like where stmt: 'a, return: 'a means the returned pointer lives as long as the statement handle. This lets the analyzer catch dangling pointers from external APIs.The GitHub repo also has an annotations guide with some more info [0]. The general syntax appears to be:
// @lifetime: (parameters) -> return_type where constraints
[0]: https://github.com/shuaimu/rusty-cpp/blob/main/docs/annotati...The ones in `@external` seem to be limited to C++ definitions outside the user control.
- Pragmatic syntax choice: The comment-based annotation system is indeed a clever solution that minimizes friction for adoption
- Avoiding past pitfalls: By learning from previous safety proposals, this approach sidesteps the intrusive syntax issues that hindered earlier efforts
- Incremental adoption pathway: The ability to gradually introduce safety guarantees without requiring a complete rewrite is a game-changer for legacy codebases
- Democratizing compiler expertise: Leveraging LLMs to tackle problems that traditionally required specialized knowledge is an exciting development
Overall, this represents a promising step forward in bridging the gap between C++ and Rust's safety guarantees. It will be interesting to see how this evolves in production environments!
Anytime that a coder modifies a function, the safe/unsafe-ness of the function will have to be audited.
People complain about comments getting out of sync with the code - seems like the same thing will occur with safe/unsafe comments attached to functions unless the developers are diligent enough to verify nothing has changed on any PR.
My other quibble from the article concerns:
...It requires going through the AST. And since the static analysis is mostly statically scoped, it doesn’t require heavy cross-file analysis. It can be single-file based, very limited scope.
The large C++ codebases I've seen have not been diligent wrt object ownership. Objects may get accessed in different files - not saying this is correct, just that it happens. Objects can be accessed/modified by other non-owning objects in inconsistent ways which lead to inconsistent behaviour especially when errors occur.The most impressive C++ static analyzer I've seen is Intrinsa's PREfix product, bought by Microsoft back in the 1990s. They parsed the C++ code using a purchased C++ frontend parser (can't recall the company name, but there are only a handful of companies that sell this stuff) and stored the data references in a database. Then they'd do dataflow analysis of the codebase looking for bugs.
They came out with PREfast which does simpler realtime static analysis (more like lint really) and VC contains a version of this. I think the MS DDK also includes a static code analyzer based on this.
But considering the effort put into guiding the AI versus rolling your own code in your spare time and having to reload the context for your static analyzer while dumping out work-related information, we're taking baby steps into a new age/paradigm for software development.
Just think if this article had been posted five or ten years ago. The technology isn't perfect and it has a long ways to go. Let's hope we don't go down too many wrong paths.
However, looking at the recent commits it doesn't quite look like the most solid foundation: https://github.com/shuaimu/rusty-cpp/commit/480491121ef9efec...
fn is_interior_mutability_type(type_name: &str) -> bool {
type_name.starts_with("rusty::Cell<") ||
type_name.starts_with("Cell<") ||
type_name.starts_with("rusty::RefCell<") ||
type_name.starts_with("RefCell<") ||
// Also check for std::atomic which has interior mutability
type_name.starts_with("std::atomic<") ||
type_name.starts_with("atomic<")
}
… which then 30 minutes later is being removed again because it turns out to be completely dead code: https://github.com/shuaimu/rusty-cpp/commit/84aae5eff72bb450...There's also quite a lot of dead code. All of these warnings are around unused variable, functions, structs, fields:
warning: `rusty-cpp` (bin "rusty-cpp-checker") generated 90 warnings (44 duplicates) Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
This isn't just vibe code. It's mobile vibe code.No logic, no coherence———just inconsistency.
---
Note: This is an experimental shitpost. Fork it. Share it. Use it. [EMOJI ROCKET]
It also looks like it's skipping some lifetime checks in some sketchy way
And I don't think it's constructive to cherrypick commits in this context.
> I even started trying out the fully autonomous coding: instead of examining its every action, I just write a TODO list with many tasks, and ask it to finish the tasks one by one.
> I never had to fully understand the code. What I had to do is: I asked it to give me a plan of changes before implementation, it gave me a few options, and then I chose the option that seemed most reasonable to me. Remember, I’m not an expert on this. I think most of the time, anybody who has taken some undergraduate compiler class would probably make the right choice.
The idea has merits. Take it as a PoC.
I don't understand why you feel it's not "constructive" to review the quality of code of a project. Are people supposed to just blindly believe in the functionality without peeking under the hood?
Initial praising doe not preclude rudeness. And complaining about a commit that was undone 30 minutes later is not only pointless in the presented context, it's a cheap attempt at insulting.
> Are people supposed to just blindly believe in the functionality without peeking under the hood
False dichotomy. No one said that. And we both know this is not the way regardless of the codebase.
I think the idea has merits and given the honesty of the post, it's rather more productive to comment on it instead.
Does it? There have been a gazillion such static analyzers. They all do one of two things: ignore the hard parts of tackle the hard parts. If you ignore the hard parts then your tool is useless. If you tackle the hard parts then your tool is orders of magnitude more complex and it still struggles to work well for real world projects. This is in the former category.
The article says "And since the static analysis is mostly statically scoped, it doesn’t require heavy cross-file analysis."
Oops. Suddenly you either handle aliasing soundly and your tool is plagued with zillions of false positives or you handle aliasing unsoundly and... you aren't getting what makes rust different. Separate compilation has been a problem for C++ analyzers for ages. Just declaring it to not actually be a big deal is a huge red flag.
Heck, even just approaching this as an AST-level analysis is going to struggle when you encounter basic things like templates.
The article says this: "Everybody tries to fix the language, but nobody tries to just analyze it." This is just flagrantly false. What's bizarre is that there are people at Stony Brook who have done this. Also, introducing new syntax (even if they are annotations) is more-or-less the same thing as "fixing the language" except that there is almost no chance that your dependencies (including the standard library) are annotated in the way you need.
I'm not sure if it's a good or bad thing people expect the robots to produce proper code on the first attempt?
That problem seems even more prevalent in Rust, where I see Arc used everywhere, presumably as a cop-out not to have to figure out how to satisfy the borrow checker in smarter ways.
both languages don't have a good way to handle circular references should you need them (again my rust isn't strong but I think that is right). You are correct to say avoid that - but sometimes you need them.
C++ is not limited to unique_ptr, the language (unlike Rust) allows you to define your own semantics of what a value is. You can then work in terms of copying or moving values, which makes lifetime management trivial as they are scope-bound.
C++ gives you more more things, but none of them are enforced. (I'm sure Rust wants those same things at time - but since I'm not aware of anyone with any ideas how to enforce them so Rust has decided to not allow those - a reasonable choice overall, but sometimes annoying when it means you can't do something that you "know" is correct just because it can't be proved correct in the language)
Valid programs don't need guardrails, since you need to satisfy those requirements for the program to be valid in the first place.
I want guard rails to ensure that I got everything right, not just 99.99% of the cases right.
Rust has inherited mutability, while I believe const in C++ is shallow. I don't think it's a perfect match.
Now you obviously can still have escape hatches and cast the const away whenever you want.
Yes, but if your struct contains references, the constness doesn't apply to what those references point to. In Rust it does.
Nothing prevents you from building a smart pointer with those semantics though, std::indirect is an example of this (arguably closer to Rust's Box).
So often the question ai related pieces ask is "can ai do X?" when by far the more important question is "should ai do X?" As written, the piece reads as though the author has learned helplessness around c++ and their answer is to adopt a technology that leaves them even more helpless, which they indeed lament. I'd challenge the author to actually reflect on why the are so attached to this legacy software and why they cannot abandon it if it is causing this level of angst.
Many comments are right this is prototype and there isn't any code guarantee! It is purely test case driven.
But this prototype sort of proves that to have Rust-equivalent memory safety, you don't really need to completely ditch C++, and all those "rewrite in Rust" clones of C++ repos. The time I spent on this project is very limited. I did maybe half of the dev on my phone through Happy. If Microsoft or Google who has lots of C++ code is willing to put some serious resources on this idea, I am sure they can have something a lot more solid. And they don't have to give up C++ (they shouldn't, it is very unclever engineering wise).
To me personally this prototype is a "usable" alternative to Circle C++. It saved me a lot of hard debugging time.
hu3•1d ago
Simple and elegant solution.