What makes Claude Code so damn good

https://minusx.ai/blog/decoding-claude-code/

469•samuelstros•5mo ago

Comments

LaGrange•5mo ago

[flagged]

dang•5mo ago

Please don't post unsubstantive comments to Hacker News, and especially not putdowns.

The idea here is: if you have a substantive point, make it thoughtfully. If not, please don't comment until you do.

https://news.ycombinator.com/newsguidelines.html

dingnuts•5mo ago

I appreciate the vague negative takes on tools like this where it feels like there is so much hype it's impossible to have a different opinion. "It's bad" is perfectly substantiative in my opinion; this person tried it, didn't like it, and doesn't have much more to say because of that, but it's still a useful perspective.

Is this why HN is so dang pro-AI? the negative comments, even small ones, are moderated away? explains a lot TBH

h4ch1•5mo ago

I think this comment would be a little better by specifying WHY it's bad instead of just a "it's bad" like it's a Twitter thread.

LaGrange•5mo ago

The subject is pretty exhausted. The reason I post "it's bad" because, honestly, expending on it just feels like a waste of time and energy. The point is demonstrating that this _isn't_ a consensus, and not much more than that.

Edit: bonus points if this gets me banned.

dang•5mo ago

(We don't ban people for posting like this!)

If it felt like a waste of time and energy to post something substantive, rather than the GP comment (https://news.ycombinator.com/item?id=44998577), then you should have just posted nothing. That comment was obviously neither substantive nor thoughtful. This is hardly a borderline call!

We want substantive, thoughtful comments from people who do have the time and energy to contribute them.

Btw, to avoid a misunderstanding that sometimes shows up: it's fine for comments to be critical; that is, it's possible to be substantive, thoughtful, and critical all at the same time. For example, I skimmed through your account's most recent comments and saw several of that kind, e.g. https://news.ycombinator.com/item?id=44299479 and https://news.ycombinator.com/item?id=42882357. If your GP comment had been like that, it would have been fine; you don't have to like Claude Code (or whatever the $thing is).

exe34•5mo ago

that wasn't a negative comment though. a negative comment would explain what they didn't like about it. this was the digital equivalent of flytipping.

danielbln•5mo ago

There is no value in a single poster saying "it's bad". I don't know this person, there is zero context on why I should care that this user thinks it's bad. Unless they state why they think it's bad, it adds nothing to the conversation and is just noise

dang•5mo ago

HN is by no means "pro-AI". It's sharply divided, and (as always with these things) each side assumes the other side is dominant.

FergusArgyll•5mo ago

"After viewing identical samples of major network television coverage of the Beirut massacre, both pro-Israeli and pro-Arab partisans rated these programs, and those responsible for them, as being biased against their side."

https://users.ssc.wisc.edu/~jpiliavi/965/hwang.pdf

•5mo ago

alex1138•5mo ago

What do people think of Google's Gemini (Pro?) compared to Claude for code?

I really like a lot of what Google produces, but they can't seem to keep a product that they don't shut down and they can be pretty ham-fisted, both with corporate control (Chrome and corrupt practices) and censorship

KaoruAoiShiho•5mo ago

It sucks.

KaoruAoiShiho•5mo ago

Lol downvoted, come on anyone who has used gemini and claude code knows there's no comparison... gimme a break.

bitpush•5mo ago

You're getting down voted because of the curt "it sucks" which shows a level of shallowness in your understanding.

Nothing in the world is simply outright garbage. Even the seemingly worst products exist for a reason and is used for a variety of use cases.

So, take a step back and reevaluate whether your reply could have been better. Because, it simply "just sucks"

polotics•5mo ago

can you detail the differences you see that substantiate your judgement?

ezfe•5mo ago

Gemini frequently didn't write code for me for no explicable reason, and just talked about a hypothetical solution. Seems like a tooling issue though.

djmips•5mo ago

Sounds almost human!

brabel•5mo ago

LLMs are built on human content and they do behave similarly to humans sometimes, including both the good and the bad.

yomismoaqui•5mo ago

According to the guys from Amp Claude Sonnet/Opus are better at tool use.

_ea1k•5mo ago

For the web ui (chat)? I actually really like gemini 2.5 pro.

For the command line tool (claude code vs gemini code)? It isn't even close. Gemini code was useless. Claude code was mostly just slow.

Herring•5mo ago

Yeah I was also getting much better results on the Gemini web ui compared to the Gemini terminal. Haven't gotten to Claude yet.

upcoming-sesame•5mo ago

You mean Gemini CLI. Yeah it's confusing

_ea1k•5mo ago

Thanks, that's the one!

lifthrasiir•5mo ago

Yeah, the main strength of gemini-cli is being open-sourced and it still needs much polishing. I ended up building my own web-based interactive agent based on gemini-cli [1] out of frustration.

[1] https://github.com/lifthrasiir/angel

stabbles•5mo ago

In my experience it's better at lower level stuff, like systems programming. A pass afterwards with claude makes the code more readable.

Keyframe•5mo ago

It's doing rather well at thinking, but not at coding. When it codes, often enough it runs in circles and ignores input. Where I find it useful is to read through larger codebases and distill what I need to find out from it. Even using gemini from claude to consult it for certain things. Opus is also like that btw, but a bit better at coding. Sonnet though, excels at coding.. from my experience though.

koakuma-chan•5mo ago

I don't think Gemini Pro is necessarily worse at coding, but in my experience Claude is substantially better at "terminal" tasks (i.e. working with the model through a CLI in the terminal) and most of the CLIs use Claude, see https://www.tbench.ai/leaderboard.

jonfw•5mo ago

Gemini is better at helping to debug difficult problems that require following multiple function calls.

I think Claude is much more predictable and follows instructions better- the todo list it manages seems very helpful in this respect.

nicce•5mo ago

If you could control the model with system command, it would be very good. But at last I have failed miserably. Model is too verbose and helpful.

divan•5mo ago

In my recent tests I found it quite smart at analyzing bigger picture (i.e. "hey, test failing not because of that, but because of whole assumption has changed and let me rewrite this test from scratch". But it also got stuck few times "I can't edit file, I'm stuck, let me try completely differently". But the biggest difference so far is the communication style - it's a bit.. snarky? I.e. comments like "yeah, tests are failing - as I suspected". Why the f it suspected failing test on the project it sees for the first time? :D

CuriouslyC•5mo ago

Gemini is amazing for taking a merge file of your whole repo, dropping it in there, and chatting about stuff. The level of whole codebase understanding is unreal, and it can do some amazing architectural planning assistance. Claude is nowhere near able to do that.

My tactic is to work with Gemini to build a dense summary of the project and create a high level plan of action, then take that to gpt5 and have it try to improve the plan, and convert it to a hyper detailed workflow xml document laying out all the steps to implement the plan, which I then hand to claude.

This avoids pretty much all of Claude's unplanned bumbling.

seanwessmith•5mo ago

mind typing this up? i've got a basic GPT -> Claude workflow going for now

CuriouslyC•5mo ago

https://gist.github.com/githubcustomerserviceistrash/c716e76...

I should mention I made that one for my research/stats workflow, so there's some specific stuff in there for that, but you can prompt chat gpt to generalize it.

threecheese•5mo ago

I mean, damn. Are terms like “executable oracles” and “hermetic boots” related to your domain, or are you using these as terms of art for an agent? Oracle being a source of truth, hermetic meaning no external dependencies or side effects - definitions in furtherance of your request for concise language. Would love to understand more.

CuriouslyC•5mo ago

This prompt is for scientific research. In general my goal is to instruct the agent to build as much validation scaffolding as possible, so rather than holding its hand I can just give it a series of concrete hurdles and tell it not to come back until they're met. I don't want it finishing the basic tasks and coming back to me saying the app is "production ready," I want to come back after a few hours to the agent having "proven a spec" with a demo or a paper that I can iterate on.

filchermcurr•5mo ago

The Gemini CLI tool is atrocious. It might work sometimes for analyzing code, but for modifying files, never. The inevitable conclusion of every session I've ever tried has been an infinite loop. Sometimes it's an infinite loop of self-deprecation, sometimes just repeating itself to failure, usually repeating the same tool failure until it catches it as an infinite loop. Tool usage frequently (we're talking 90% of the time) fails. It's also, frankly, just a bummer to talk to. The "personality" is depressed, self-deprecating, and just overall really weird.

That's been my experience, anyway. Maybe it hates me? I sure hate it.

klipklop•5mo ago

This matches my experience with it. I won’t let it touch any code I have not yet safely checked in before firing up Gemini. It will commonly get into a death loop mid session that can’t be recovered from.

esafak•5mo ago

Once it repeatedly printed shame in all caps. I was worried until I figured out it was talking to itself.

polotics•5mo ago

this is so weird I am not at all getting the same experience, its tools work, it changes typescript and python confidently, makes mistakes, understands them and fixes them. I had a case of it giving up and admitting failure, but not in the way you describe

esafak•5mo ago

I used to like it a lot but I feel like it got dumber lately. Am I imagining things or has anyone else observed this too?

donperignon•5mo ago

Personally gemini has been giving me better results. Claude keeps trying to generate react code even when the whole context and my command is svelte, and failing constantly to give me something that can at least run, gemini, on the other hand has been pretty good with styling, and useful with the bussines logic. I dont get all the hype around claude.

gedy•5mo ago

Claude Code is just a nicer dev experience, especially with simpler stuff. I've seen Gemini do much better with Svelte as well.

poniko•5mo ago

Pretty much every time Claude code is stuck or more or less just coding in circles i use Gemini PRO to analyze the code/data and feed the response into Claude to solve it. I also have much more success with Gemini when creating big sql transforming scripts or similar. Both are quite bad on bigger tasks, they get you 60% and then i spend days and days to trying to get to 100% .. its such a time sink when i select the wrong task for the llm.

siva7•5mo ago

It's more interesting to compare what gemini cli and codex cli did wrong? (though i haven't used both of them for weeks to months)

syntaxing•5mo ago

I don’t know if I’m doing something wrong. I was using Sonnet 4 with GitHub Copilot. Recently a week ago switched to Claude Code. I find GitHub Copilot solves problem and bugs way better than Claude Code. For some reason, Claude Code seems very lazy. Has anyone experience something similar?

cosmic_cheese•5mo ago

I haven’t tried other LLMs but have a fair amount of experience with Claude Code, and there definitely times when you have to be explicit about the route you want it to take and tell it to not take shortcuts.

It’s not consistent, though. I haven’t figured out what they are but it feels like there are circumstances where it’s more prone to doing ugly hacky things.

StephenAshmore•5mo ago

It may be a configuration thing. I've found quite the opposite. Github Copilot using Sonnet 4 will not manage context very well, quite frequently resorting to running terminal commands to search for code even when I gave it the exact file it's looking for in the copilot context. Claude code, for me, is usually much smarter when it comes to reading code and then applying changes across a lot of files. I also have it integrated into the IDE so it can make visual changes in the editor similar to GitHub Copilot.

syntaxing•5mo ago

I do agree with you, Github Copilot uses more tokens like you mentioned with redundant searches. But at the end of the day, it solves the problem. Not sure if the cost out weights the benefit though compared to Claude Claude. Going to try Claude Code more and see if I'm prompting it incorrectly.

libraryofbabel•5mo ago

The consensus is the opposite: most people find copilot does less well than Claude with both using sonnet 4. Without discounting your experience, you’ll need to give us more detail about what exactly you were trying to do (what problem, what prompt) and what you mean by “lazy” if you want any meaningful advice though.

sojournerc•5mo ago

Where do you find this "consensus"?

rsanek•5mo ago

read HN threads, talk to people using AI alot. I have the same perception

sojournerc•5mo ago

Got it, anecdata.

wordofx•5mo ago

I have most of the tools setup so I can switch between them and test which is better. So far Amp and Claude Code are on top. GH Copilot is the worst. I know MS is desperately trying to copy its competitors but the reality is, they are just copying features. They haven’t solved the system prompts. So the outcomes are just inferior.

riazrizvi•5mo ago

I use ChatGPT, and I have used Claude several times. I’ve not found Claude to be better. I’ve come to the conclusion that all these posts asking why Claude is so good at coding, are all part of some marketing approach. I think it’s tied to how Claude prefers to hook into repos, I think maybe it’s tied to a business strategy of acquiring a mega code dataset. So they are especially motivated to push this narrative vs say OpenAI or other players.

(I don’t use any clients that answer coding questions by using the context of my repos).

tcoff91•5mo ago

If you aren’t using a client that automatically uses the context of your repos then you don’t understand why people like Claude. You need to use the Claude Code CLI in order to really get the best results.

riazrizvi•5mo ago

I’m not in the market for a solution where I need to trust some company with my IP. I understand that Claude also wants to serve my use case, so I take these headlines at face value as also applying to me, since they don’t qualify themselves with ‘Claude Code CLI’.

tcoff91•5mo ago

‘Claude Code’ refers to the CLI.

riazrizvi•5mo ago

Oh. I stand corrected.

diego_sandoval•5mo ago

It shocks me when people say that LLMs don't make them more productive, because my experience has been the complete opposite, especially with Claude Code.

Either I'm worse than then at programming, to the point that I find an LLM useful and they don't, or they don't know how to use LLMs for coding.

dsiegel2275•5mo ago

Agreed. I only started using Claude Code about a week and a half ago and I'm blown away by how productive I can be with it.

pawelduda•5mo ago

I've had occasions where a relatively short prompt solved me an entire day of debugging and fixing things, because it was tech stack I barely knew. Most impressive part was when CC knew the changes may take some time to be applied and just used `sleep 60; check logs;` 2-3 times and then started checking elsewhere if something's stuck. It was, CC cleaned it up and a minute later someone pinged me that the it works.

ta12653421•5mo ago

Productivity boost is unbelieveable! If you handle it right, its a boon - its like having 3 junior devs at hand. And I'm talking about using the web interface.

I guess most people are not paying and cant therefore apply the project-space (which is one of the best features), which unleashes its full magic.

Even if I'm currently without a job, I'm still paying because it helps me.

ta12653421•5mo ago

LOL why do I get downvoted for explaining my experience? :-D

pawelduda•5mo ago

Because you posted a success story about LLM usage on HN

ta12653421•5mo ago

Well, understood, but that part between the lines is not my fault?

pawelduda•5mo ago

Nah, never implied that

fourthark•5mo ago

So describe your experience without being a booster

tjr•5mo ago

What do you work on, and what do LLMs do that helps?

(Not disagreeing, but most of these comments -- on both sides -- are pretty vague.)

SXX•5mo ago

For once LLMs are good for building game prototypes. When all you care is to check whatever something is fun to play it really doesn'a matter how much of tech debt you generate in process.

And you start from the stratch all the time so you can generate all the documentation before you ever start to generate code. And when LLM slop become overwhelming you just drop it and go to check next idea.

_ea1k•5mo ago

What is performance like for you? I've been shocked at how many simple requests turn into >10 minutes of waiting.

If people are getting faster responses than this regularly, it could account for a large amount of the difference in experiences.

totalhack•5mo ago

Agree with this, though I've mostly been using Gemini CLI. Some of the simplest things, like applying a small diff, take many minutes as it loses track of the current file state and takes minutes to figure it out or fail entirely.

wredcoll•5mo ago

The best part about llm coding is that you feel productive even when you aren't, makes coding a lot more fun.

timr•5mo ago

It depends very much on your use case, language popularity, experience coding, and the size of your project. If you work on a large, legacy code base in COBOL, it's going to be much harder than working on a toy greenfield application in React. If your prior knowledge writing code is minimal, the more amazing the results will seem, and vice-versa.

Despite the persistent memes here and elsewhere, it doesn't depend very much on the particular tool you use (with the exception of model choice), how you hold it, or your experience prompting (beyond a bare minimum of competence). People who jump into any conversation with "use tool X" or "you just don't understand how to prompt" are the noise floor of any conversation about AI-assisted coding. Folks might as well be talking about Santeria.

Even for projects that I initiate with LLM support, I find that the usefulness of the tool declines quickly as the codebase increases in size. The iron law of the context window rules everything.

Edit: one thing I'll add, which I only recently realized exists (perhaps stupidly) is that there is a population of people who are willing to prompt expensive LLMs dozens of times to get a single working output. This approach seems to me to be roughly equivalent to pulling the lever on a slot machine, or blindly copy-pasting from Stack Overflow, and is not what I am talking about. I am talking about the tradeoffs involved in using LLMs as an assistant for human-guided programming.

ivan_gammel•5mo ago

Overall I would agree with you, but I start feeling that this „iron law“ isn’t as simple as that. After all, humans have limited „context window“ too — we don’t remember every small detail on a large project we have been working on for several years. Loose coupling and modularity helps us and can help LLM to make the size of the task manageable if you don’t ask it to rebuild the whole thing. It’s not the size that makes LLMs fail, but something else, probably the same things where we may fail.

timr•5mo ago

Humans have a limited short-term memory. Humans do not literally forget everything they've ever learned after each Q&A cycle.

(Though now that I think of it, I might start interrupting people with “SUMMARIZING CONVERSATION HISTORY!” whenever they begin to bore me. Then I can change the subject.)

ivan_gammel•5mo ago

LLMs do not „forget“ everything completely either. Probably all major tools by now consume information from some form of memory (system prompt, Claude.md, project files etc) before your prompt. Claude Code rewrites the Claude.md, ChatGPT may modify the chat memory if it finds it necessary etc.

timr•5mo ago

Writing stuff in a file is not “memory” (particularly if I have to do it), and in any case, it consumes context. Overrun the context window, and the tool doesn’t know about what is lost.

There are various hacks these tools take to cram more crap into a fixed-size bucket, but it’s still fundamentally different than how a person thinks.

ivan_gammel•5mo ago

> Writing stuff in a file is not “memory”

Do you understand yourself what you just said? File is a way to organize data in memory of a computer by definition. When you write instructions to LLM, they persistently modify your prompts making LLM „remember“ certain stuff like coding conventions or explanations of your architectural choices.

> particularly if I have to do it

You have to communicate with LLM about the code. You either do it persistently (must remember) or contextually (should know only in context of a current session). So word „particularly“ is out of place here. You choose one way or another instead of bring able to just tell that some information is important or unimportant long-term. This communication would happen with humans too. LLMs have different interface for it, more explicit (giving the perception of more effort, when it is in fact the same; and let’s not forget that LLM is able to decide itself on whether to remember something or not).

> and in any case, it consumes context

So what? Generalization is an effective way to compress information. Because of it persistent instructions consume only a tiny fraction of context, but they reduce the need for LLM to go into full analysis of your code.

> but it’s still fundamentally different than how a person thinks.

Again, so what? Nobody can keep in short-term memory the entire code base. It should not be the expectation to have this ability neither it should not be considered a major disadvantage not to have it. Yes, we use our „context windows“ differently in a thinking process. What matters is what information we pack there and what we make of it.

BeetleB•5mo ago

Both true and irrelevant.

I've yet had the "forgets everything" to be a limiting factor. In fact, when using Aider, I aggressively ensure it forgets everything several times per session.

To me, it's a feature, not a drawback.

I've certainly had coworkers who I've had to tell "Look, will you forget about X? That use case, while it look similar, is actually quite different in assumptions, etc. Stop invoking your experiences there!"

faangguyindia•5mo ago

the "context" is the short term memory equivalent of LLM.

Long term memory is its training data.

SXX•5mo ago

This heavily depends on what project and stack you working on. LLMs are amazing for building MVPs or self-contained micro-services on modern, popular and well-defined stacks. Every single dependency, legacy or proprietary library and every extra MCP make it less usable. It get's much worse if codebase itself is legacy unless you can literally upload documentation for each used API into context.

A lot of programmers work on maintaining huge monolith codebases, built on top of 10-years old tech using obscure proprietary dependencies. Usually they dont have most of the code to begin with and APIs are often not well documented.

cpursley•5mo ago

I feel like I could have written this myself; I'm truly dumbfounded. Maybe I am just a crappy coder but I don't think I'd be getting such good results with Claude Code if I were.

socalgal2•5mo ago

I’m trying to learn jj. Both Gemini and ChatGPT gave me incorrect instructions 4 of 5 times

https://jj-vcs.github.io/jj/

BeetleB•5mo ago

That's because jj is relatively new, and constantly changing. The official tutorial is (by their own admission), out of date. People's blog posts are fairly different in what commands/usage they recommend, as well.

I know it, because I recently learned jj, with a lot of struggling.

If a human struggles learning it, I wouldn't expect LLMs to be much better.

esafak•5mo ago

That's ironic considering jj is supposed to make version control easier.

BeetleB•5mo ago

It does make it easier. Don't conflate documentation with the tool itself.

exe34•5mo ago

it makes me very productive with new prototypes in languages/frameworks that I'm not familiar with. conversely, a lot of my work involves coding as part of understanding the business problem in the first place. think making a plot to figure out how two things relate, and then based on the understanding trying out some other operation. it doesn't matter how fast the machine can write code, my slow meat brain is still the bottleneck. the coding is trivial.

Aurornis•5mo ago

I’ve found LLMs useful at some specific tasks, but a complete waste of time at others.

If I only ever wrote small Python scripts, did small to medium JavaScript front end or full stack websites, or a number of other generic tasks where LLMs are well trained I’d probably have a different opinion.

Drop into one of my non-generic Rust codebases that does something complex and I could spent hours trying to keep the LLM moving in the right direction and away from all of the dead ends and thought loops.

It really depends on what you’re using them for.

That said, there are a lot of commenters who haven’t spent more than a few hours playing with LLMs and see every LLM misstep as confirmation of their preconceived ideas that they’re entirely useless.

lambda•5mo ago

It can be more than one reason.

First of all, keep in mind that research has shown that people generally overestimate the productivity gains of LLM coding assistance. Even when using a coding assistant makes them less productive, they feel like they are more productive.

Second, yeah, experience matters, both with programming and LLM coding assistants. The better you are, the less helpful the coding assistant will be, it can take less work to just write what you want than convince an LLM to do it.

Third, some people are more sensitive to the kind of errors or style that LLMs tend to use. I frequently can't stand the output of LLMs, even if it technically works; it doesn't live to to my personal standards.

pton_xd•5mo ago

> Third, some people are more sensitive to the kind of errors or style that LLMs tend to use. I frequently can't stand the output of LLMs, even if it technically works; it doesn't live to to my personal standards.

I've noticed the stronger my opinions are about how code should be written or structured, the less productive LLMs feel to me. Then I'm just fighting them at every step to do things "my way."

If I don't really have an opinion about what's going on, LLMs churning out hundreds of lines of mostly-working code is a huge boon. After all, I'd rather not spend the energy thinking through code I don't care about.

Uehreka•5mo ago

> research has shown that people generally overestimate the productivity gains of LLM coding assistance.

I don’t think this research is fully baked. I don’t see a story in these results that aligns with my experience and makes me think “yeah, that actually is what I’m doing”. I get that at this point I’m supposed to go “the effect is so subtle that even I don’t notice it!” But experience tells me that’s not normally how this kind of thing works.

Perhaps we’re still figuring out how to describe the positive effects of these tools or what axes we should really be measuring on, but the idea that there’s some sort of placebo effect going on here doesn’t pass muster.

phyzome•5mo ago

I mean, you're one person (so it doesn't have to match) and you're not carefully measuring everything (so you don't have a basis for comparison).

d-lisp•5mo ago

Basic engineering skills (frontend development, python, even some kind of high level 3d programming) are covered. If you do C/C++, or even Java in a preexisting project then you will have a hard time constantly explaining the LLM why <previous answer> is absolute nonsense.

Everytime I tried LLMs, I had the feeling of talking with a ignorant trying to sound VERY CLEVER: terrible mistakes at every line, surrounded with punchlines, rocket emojis and tons of bullshit. (I'm partly kidding).

Maybe there are situations where LLMs are useful e.g. if you can properly delimit and isolate your problem; but when you have to write code that is meant to mess up with the internal of some piece of software then it doesn't do well.

It would be nice to know from each part of the "happy users" and "mecontent usere" of LLMs in what context they experimented with it to be more informed on this question.

AaronAPU•5mo ago

If you’re working with a massive complicated C++ repository, you have to take the time to collect the right context and describe the problem precisely enough. Then you should actually read the code to verify it even makes sense. And at that point, if you’re a principle level developer, you could just as easily do it yourself.

But the situation is very different if you’re coding slop in the first place (front end stuff, small repo simple code). The LLMs can churn that slop out at a rapid clip.

breuleux•5mo ago

Speaking for myself, LLMs are reasonably good at writing tests or adapting existing structures, but they are not very good at doing what I actually want to do (design, novelty, trying to figure out the very best way to do a thing). I gain some productivity from the reduction of drudgery, but that's never been much of a bottleneck to begin with.

The thing is, a lot of the code that people write is cookie-cutter stuff. Possibly the entirety of frontend development. It's not copy-paste per se, but it is porting and adapting common patterns on differently-shaped data. It's pseudo-copy-paste, and of course AI's going to be good at it, this is its whole schtick. But it's not, like, interesting coding.

majormajor•5mo ago

> It is extremely important to identify the most important task the LLM needs to perform and write out the algorithm for it. Try to role-play as the LLM and work through examples, identify all the decision points and write them explicitly. It helps if this is in the form of a flow-chart.

I get lost a bit at things like this, from the link. The lessons in the article match my experience with LLMs and tools around them (see also: RAG is a pain in the ass and vector embedding similarity is very far from a magic bullet), but the takeaway - write really good prompts instead of writing code - doesn't ring true.

If I need to write out all the decision points and steps of the change I'm going to make, why am I not just doing it myself?

Especially when I have an editor that can do a lot of automated changes faster/safer than grep-based text-first tooling? If I know the language the syntax isn't an issue; if I don't know the language it's harder to trust the output of the model. (And if I 90% know the language but have some questions, I use an LLM to plow through the lines I used to have to go to Google for - which is a speedup, but a single-digit-percentage one.)

My experience is that the tools fall down pretty quickly because I keep trying to make them to let me skip the details of every single task. That's how I work with real human coworkers. And then something goes sideways. When I try to pseudocode the full flow vs actually writing the code I lose the speed advantage, and often end up with a nasty 80%-there-but-I-don't-really-know-how-to-fix-the-other-20%-without-breaking-the-80% situation because I noticed a case I didn't explicitly talk about that it guessed wrong on. So then it's either slow and tedious or `git reset` and try again.

(99% of these issues go away when doing greenfield tooling or scripts for operations or prototyping, which is what the vast majority of compelling "wow" examples I've seen have been, but only applies to my day job sometimes.)

OtherShrezzing•5mo ago

I think it’s just that the base model is good at real world coding tasks - as opposed to the types of coding tasks in the common benchmarks.

If you use GitHub Copilot - which has its own system level prompts - you can hotswap between models, and Claude outperforms OpenAI’s and Google’s models by such a large margin that the others are functionally useless in comparison.

ec109685•5mo ago

Anthropic has opportunities to optimize their models / prompts during reinforcement learning, so the advice from the article to stay close to what works in Claude code is valid and probably has more applicability for Anthropic models than applying the same techniques to others.

With a subscription plan, Anthropic is highly incentivized to be efficient in their loops beyond just making it a better experience for users.

badestrand•5mo ago

I read all the praise about Claude Code, tried it for a month and was very disappointed. For me it doesn't work any better than Cursor's sidebar and has worse UX on top. I wonder if I am doing something wrong because it just makes lots of stupid mistakes when coding for me, in two different code bases.

mnvrth•5mo ago

I'll suggest giving it another shot. It really is a game changer (I can't tell what you're doing wrong, but in a few people I've seen it has been about doing a psychological switch. I wrote about it a bit here - https://mnvr.in/beginners-mind, sharing in case it helps you see how you might approach it differently)

paool•5mo ago

It's not just the base model

Try using opus with cline in vs code. Then use Claude code.

I don't know the best way to quantify the differences, but I know I get more done in CC.

afarah1•5mo ago

But is it a game changer vs CoPilot in Agent mode with Claude 4 Sonnet?

Because it's twice the price and doesn't even have a trial.

I feel like if it were a game changer, like Cursor once was vs Ask mode with GPT, it would be worth it, but CoPilot has come a long way and the only up-to-date comparisons I've read point to it being marginally better or the same, but twice the price.

sdsd•5mo ago

Oof, this comes at a hard moment in my Claude Code usage. I'm trying to have it help me debug some Elastic issues on Security Onion but after a few minutes it spits out a zillion lines of obfuscated JS and says:

  Error: kill EPERM
      at process.kill (node:internal/process/per_thread:226:13)
      at Ba2 (file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19791)
      at file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19664
      at Array.forEach (<anonymous>)
      at file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19635
      at Array.forEach (<anonymous>)
      at Aa2 (file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19607)
      at file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19538
      at ChildProcess.W (file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:20023)
      at ChildProcess.emit (node:events:519:28) {
    errno: -1,
    code: 'EPERM',
    syscall: 'kill'
  }

I'm guessing one of the scripts it runs kills Node.js processes, and that inadvertantly kills Claude as well. Or maybe it feels bad that it can't solve my problem and commits suicide.

In any case, I wish it would stay alive and help me lol.

sixtyj•5mo ago

Jump to another LLM helps me to find what happened. *This is not a official advice :)

idontwantthis•5mo ago

I have had zero good results with any LLM and elastic search. Everything it spits out is a hallucination because there aren’t very many examples of anything complete and in context on the internet.

triyambakam•5mo ago

I would try upgrading or wiping away your current install and re-installing it. There might be some cached files somewhere that are in a bad state. At least that's what fixed it for me when I recently came across something similar.

yc-kraln•5mo ago

I get this issue when it uses sudo to run a process with root privileges, and then times out.

schmookeeg•5mo ago

Claude and some of the edgier parts of localstack are not friends either. It's pretty okay at rust which surprised me.

It makes me think that the language/platform/architecture that is "most known" by LLMs will soon be the preferred -- sort of a homogenization of technologies by LLM usage. Because if you can be 10x as successfully vibey in, say, nodejs versus elixir or go -- well, why would you opt for those in a greenfield project at all? Particularly if you aren't a tech shop and that choice allows you to use junior coders as if they were midlevel or senior.

actsasbuffoon•5mo ago

This mirrors a weird thought I’ve had recently. It’s not a thing I necessarily agree with, but just an idea.

I hear people say things like, “AI isn’t coming for my job because LLMs suck at [language or tech stack]!”

And I wonder, does that just mean that other stacks have an advantage? If a senior engineer with Claude Code can solve the problem in Python/TypeScript in significantly less time than you can solve it in [tech stack] then are you really safe? Maybe you still stack up well against your coworkers, but how well does your company stack up against the competition?

And then the even more distressing thought accompanies it: I don’t like the code that LLMs produce because it looks nothing like the code I write by hand. But how relevant is my handwritten code becoming in a world where I can move 5x faster with coding agents? Is this… shitty style of LLM generated code actually easier for code agents to understand?

Like I said, I don’t endorse either of these ideas. They’re just questions that make me uncomfortable because I can’t definitively answer them right now.

dgunay•5mo ago

Letting go of the particulars of the generated code is proving difficult for me. I hand edit most of the code my agents produce for taste even if it is correct, but I feel that in the long term that's not the optimal use of my time in agent-driven programming. Maybe the models will just get so good that they know how I would write it myself.

bilekas•5mo ago

I would argue this approach will help you in the long term with code maintainability. Which I feel will be one of the biggest issues down the line with AI generated codebases as they get larger.

monkpit•5mo ago

The solution is to codify these sorts of things in prompts and tool use and gateways like linters etc. you have to let go…

bilekas•5mo ago

What do you mean "you have to let go".

I use some ai tools and sometimes they're fine, but I won't in my lifetime anyway hand over everything to an AI, not out of some fear or anything, but even purely as a hobby. I like creating things from scratch, I like working out problems, why would I need to let that go?

jaggederest•5mo ago

Well, the point is, if it's not a hobby, you have to encode your preferences in lint and formatters, rather than holding onto manually messing with the output.

It's really freeing to say "Well, if the linter and the formatter don't catch it, it doesn't matter". I always update lint settings (writing new rules if needed) based on nit PR feedback, so the codebase becomes easier to review over time.

It's the same principle as any other kind of development - let the machine do what the machine does well.

dgunay•5mo ago

I have been doing this, and it does sort of work, but the problem is that for things that can't easily be turned into deterministic lints, prompting isn't 100% reliable. Every bit you go against the LLM's training data, it's more likely to forget to do it.

fragmede•5mo ago

LLMs write python and typescript well, because of all the examples in their training data. But what if we made a new programming language whos goal was to be optimal for an LLM to generate it? Would it be closer to assembly? If we project that the future is vibe coded, and we scarcely look at the outputted code, testing, instead, that the output matches the input correctly, not looking at the code, what would that language look like?

alankarmisra•5mo ago

They’d presumably do worse. LLMs have no intrinsic sense of programming logic. They are merely pattern matching against a large training set. If you invent a new language that doesn’t have sufficient training examples for a variety of coding tasks, and is syntactically very different from all the existing languages, the LLMs wouldn’t have enough training data and would do very badly.

metrix•5mo ago

I have thought the same thing. How is it created? is it an idea by an LLM to make the language, or a dev to create a language designed for an llm.

How do we get the LLM to gain knowledge on this new language that we have no example usage of?

fragmede•5mo ago

Same way we do for any other language. Give it docs and a runtime, and have it go off and generate code and use that training data.

hoyo1s•5mo ago

Strict type-checking and at least with some dependent type and inductive type

majormajor•5mo ago

What is it that you think would make a certain non-Python language "more optimal" for an LLM? Is there something inherently LLM-friendly about certain language patterns or is "huge sets of training examples" and "a robust standard library" (the latter to conserve tokens/attention vs having to spit out super-verbose 20x longer assembly all day) all "optimality" means?

fragmede•5mo ago

It's fair to point out that I didn't define exactly what we're optimizing for, but I can have the LLM generate assembly if I ask. It'll be faster than python, at the expense of readability, but if we're no longer writing code, then why not straight up use assembly? ARM vs x86 becomes an issue, but are there other reasons not to use assembly?

majormajor•5mo ago

All the disadvantages of those stacks still exist.

So if you need to avoid GC issues, or have robust type safety, or whatever it is, to gain an edge in a certain industry or scenario, you can't just switch to the vibe tool of choice without (best case) giving up $$$ to pay to make up for the inefficiency or (worst case) having more failures that your customers won't tolerate.

But this means the gap between the "hard" work and the "easy" work may become larger - compensation included. Probably most notably in FAANG companies where people are brought in expected to be able to do "hard" work and then frequently given relatively-easy CRUD work in low-ROI ancillary projects but with higher $$$$ than that work would give anywhere else.

And the places currently happy to hire disaffected ex-FAANG engineers who realized they were being wasted on polishing widgets may start having more hiring difficulty as the pipeline dries up. Like trying to hire for assembly or COBOL today.

hoyo1s•5mo ago

Sometimes one just need [language or tech stack] to do something, especially for some performance/security considerations.

For now LLMs still suffers from hallucination and lack of generalizability, The large amount of code generated is sometimes not necessarily a benefit, but a technical debt.

LLMs are good for open and fast, prototype web applications, but if we need a stable, consistent, maintainable, secure framework, or scientific computing, pure LLMs are not enough, one can't vibe everything without checking details

SpaceNoodled•5mo ago

Looks like you're better off.

gervwyk•5mo ago

We’re considering building a coding agent for Lowdefy[1], a framework that lets you build web apps with YAML config.

For those who’ve built coding agents: do you think LLMs are better suited for generating structured config vs. raw code?

My theory is that agents producing valid YAML/JSON schemas could be more reliable than code generation. The output is constrained, easier to validate, and when it breaks, you can actually debug it.

I keep seeing people creating apps with vibe coder tools but then get stuck when they need to modify the generated code.

Curious if others think config-based approaches are more practical for AI-assisted development.

[1] https://github.com/lowdefy/lowdefy

ec109685•5mo ago

I wouldn’t get hung up on one shotting anything. Output to a format that can be machine verified, ideally in a format there is plenty of industry examples for.

Then add a grader step to your agentic loop that is triggered after the files are modified. Give feedback to the model if there any errors and it will fix them.

amelius•5mo ago

How do you specify callbacks?

Config files should be mature programming languages, not Yaml/Json files.

gervwyk•5mo ago

Callback: Blocks (React components) can register events with action chains (a sequential list of async functions) that will be called when the event is triggered. So it is defined in the react component. This abstraction of blocks, events, actions, operations and requests are the only abstraction required in the schema to build fully functional web apps.

Might sound crazy but we built full web apps in just yaml.. Been doing this for about 5 years now and it helps us scale to build many web apps, fast, that are easy to maintain. We at Resonancy[1] have found many benefits in doing so. I should write more about this.

[1] - https://resonancy.io

hamandcheese•5mo ago

> easier to validate

This is essential to productivity for humans and LLMs alike. The more reliable your edit/test loop, the better your results will be. It doesn't matter if it's compiling code, validating yaml, or anything else.

To your broader question. People have been trying to crack the low-code nut for ages. I don't think it's solvable. Either you make something overly restrictive, or you are inventing a very bad programming language which is doomed to fail because professional coders will never use it.

gervwyk•5mo ago

Good point. i’m making the assumption that if the LLM has a more limited feature space to produce as output, then the output is more predictable, and thus faster to comprehend changes. Similar to when devs use popular libraries, there is a well known abstraction, therefore less “new” code to comprehend as i see familiar functions, making the code predictable to me.

hamandcheese•5mo ago

I think we are essentially describing the same thing. You just want to achieve it by constraining the input space at a significantly higher level (yaml schema defines the output space instead of a compiler and/or test suite).

I still think you'll be at a significant disadvantage since the LLM has been trained on millions of lines of all mainstream languages, and 0 lines of gervwyks funny yaml lang.

pglevy•5mo ago

Mine is a much simpler use case but sharing in case it's useful. I wanted to be able to quickly generate and iterate on user flows during design collaboration. So I use some boilerplate HTML/CSS and have the LLM generate an "outline" (basically a config file) and then generate the HTML from that. This way I can make quick adjustments in the outline and just have it refresh the code when needed to avoid too much back forth with the chat.

Overall, it has been working pretty well. I did make a tweak I haven't pushed yet to make it always writes the outline to a file first (instead of just terminal). And I've also started adding slash commands to the instructions so I can type things like "/create some flow" and then just "/refresh" (instead of "pardon me, would you mind refreshing that flow now?").

https://github.com/pglevy/breadboarding-kit

riwsky•5mo ago

> For those who’ve built coding agents: do you think LLMs are better suited for generating structured config vs. raw code?

Raw code. Use case was configuring a mapping of health data JSON from heterogeneous sources to a standard (also JSON) format. Initial prototype was a YAML DSL, based on the same theory as yours. LLMs had difficulty using the DSL’s semantics correctly, or even getting its syntax (not YAML-level syntax, but the schema: nesting levels for different constructs, and so on). It’s possible that better error loops or something would have cracked it, but a second prototype generating jq worked so much better out of the box that we basically never looked back.

_1tem•5mo ago

CC is so damn good I want to use its agent loop in my agent loop. I'm planning to build a browser agent for some specialized tasks and I'm literally just bundling a docker image with Claude Code and a headless browser and the Playwright MCP server.

apwell23•5mo ago

cool

HacklesRaised•5mo ago

Delusional asshats trying to draft the grift?

the_mitsuhiko•5mo ago

Unfortunately, Claude Code is not open source, but there are some tools to better figure out how it is working. If you are really interested in how it works, I strongly recommend looking at Claude Trace: https://github.com/badlogic/lemmy/tree/main/apps/claude-trac...

It dumps out a JSON file as well as a very nicely formatted HTML file that shows you every single tool and all the prompts that were used for a session.

CuriouslyC•5mo ago

https://github.com/anthropics/claude-code

You can see the system prompts too.

It's all how the base model has been trained to break tasks into discrete steps and work through them patiently, with some robustness to failure cases.

the_mitsuhiko•5mo ago

> https://github.com/anthropics/claude-code

That repository does not contain the code. It's just used for the issue tracker and some example hooks.

CuriouslyC•5mo ago

It's a javascript app that gets installed on your local system...

the_mitsuhiko•5mo ago

I'm aware of how it works since I have been spending a lot of time over the last two months working with Claude's internals. If you have spent some time with it, you know that it is a transpiled and minified mess that is annoyingly hard to detangle. I'm very happy that claude-trace (and claude-bridge [1]) exists because it makes it much easier to work with the internals of Claude than if you have to decompile it yourself.

[1]: https://github.com/badlogic/lemmy/tree/main/apps/claude-brid...

koakuma-chan•5mo ago

https://github.com/dnakov/claude-code :trollface:

throwaway314155•5mo ago

That's been DMCA'd since you posted it. Happen to know where I can find a fork?

koakuma-chan•5mo ago

> That's been DMCA'd since you posted it.

I know, thus the :trollface:

> Happen to know where I can find a fork?

I don't know where you can find a fork, but even if there is a fork somewhere that's still alive, which is unlikely, it would be for a really old version of Claude Code. You would probably be better off reverse engineering the minified JavaScript or whatever that ships with the latest Claude Code.

throwaway314155•5mo ago

Gotcha, I misunderstood.

mlrtime•5mo ago

Just search dnakov/claude-code mirror and there is a path to the source code, I found it in 2 minutes.

rbren•5mo ago

If you’re looking for an OSS alternative check out OpenHands CLI: https://github.com/All-Hands-AI/OpenHands?tab=readme-ov-file

athrowaway3z•5mo ago

> "THIS IS IMPORTANT" is still State of the Art

Had a similar problems until I saw the advice "Dont say what it shouldn't but focus on what it should".

i.e. make sure when it reaches for the 'thing', it has the alternative in context.

Haven't had those problems since then.

amelius•5mo ago

I mean, if advice like this worked, then why wouldn't Anthropic let the LLM say it, for instance?

donperignon•5mo ago

Because it’s embarrassing, and probably nobody understands why this works, depending on such heuristics that can completely change in the next model is really bad…

amelius•5mo ago

I'd say exactly because behavior might change you have to include proper instructions for each model.

And depending on people in forums to provide these instructions is of course not great.

sergiotapia•5mo ago

Is Claude Code better than Amp?

radleta•5mo ago

I’d be curious to know what MCPs you’ve found useful with CC. Thoughts?

nuwandavek•5mo ago

(blogpost author here) I actually found none of them useful. I think MCP is an incomplete idea. Tools and the system prompt cannot be so cleanly separated (at least not yet). Just slapping on tools hurts performance more than it helps.

I've now gone back to just using vanilla CC with a really really rich claude.md file.

faangguyindia•5mo ago

One area of improvement is being able to plug the github issues.

I run into bugs which are not documented in documentation or anywhere except github issues.

Is it legal to search github issues using LLM? if yes how?

on_the_train•5mo ago

The lengths people will go through to avoid to code is astonishing

apwell23•5mo ago

writing code is not the fun part of coding. I only realized that after using claude code.

monkaiju•5mo ago

Hard disagree

dragonfax•5mo ago

HARD AGREE (to your disagree)

phyzome•5mo ago

Well, I suppose we might have found one of the discriminators in why some people love LLMs and some hate them...

recursive•5mo ago

This explains a lot.

yumraj•5mo ago

I made insane progress with CC over last several weeks, but lately have noticed progress stalling.

I’m in the middle of some refactoring/bug fixing/optimization but it’s constantly running into issues, making half baked changes, not able to fix regressions etc. Still trying to figure out how to make do a better job. Might have to break it into smaller chunks or something. Been pretty frustrating couple of weeks.

If anyone has pointers, I’m all ears!!

imiric•5mo ago

> If anyone has pointers, I’m all ears!!

Give programming a try, you might like it.

yumraj•5mo ago

Yeah, have been doing that for 30 years.

Next…

jampa•5mo ago

I felt that, too. It turns out I was getting 'too comfortable' while using CC. The best way is to treat CC like a junior engineer and overexplain things before letting it do anything. With time, you start to trust CC, but you shouldn't do that because it is still the same LLM when you started.

Another thing is that before, you were in a greenfield project, so Claude didn't need any context to do new things. Now, your codebase is larger, so you need to point out to Claude where it should find more information. You need to spoon-feed the relevant files with "@" where you want it to look up things and make changes.

If you feel Claude is lazy, force it to use more thinking budget "think" < "think hard" < "think harder" < "ultrathink.". Sometimes I like to throw "ultrathink" and do something else while it codes. [1]

[1]: https://www.anthropic.com/engineering/claude-code-best-pract...

swader999•5mo ago

Sometimes I take a repomix dump of a slice where there's issues and then get chat gpt to analyze it and come up with a step by step guide to fix it for Claude to follow. That has worked.

fourthark•5mo ago

I ran into this too.

In my case it was exactly the kind of situation where I would also run into trouble on my own - trying to change too many things at once.

It was doing superbly for smaller, more contained tasks.

I may have to revert and approach each task on its own.

I find I need to know better than Claude what is going on, and guide it every step. It will figure out the right code if I show it where it should go, that kind of thing.

I think people may be underestimating / underreporting how much they have to be in the loop, guiding it.

It’s not really autonomous or responsible. But it can still be very useful!

rirze•5mo ago

It's the plateau; once your codebase hits a certain size/complexity, CC struggles to make exponential progress. To make good progress, you have to dive in and guide it very finely.

1zael•5mo ago

I've literally built the entire MVP of my startup on Claude Code and now have paying customers. I've got an existential worry that I'm going to have a SEV incident that will trigger a house of falling cards, but until then I'm constantly leveraging Claude for fixing security vulnerabilities, implementing test-driven-development, and planning out the software architecture in accordance with my long-term product roadmap. I hope this story becomes more and more common as time passes.

lajisam•5mo ago

“Implementing test-driven development, and planning out software architecture in accordance with my long-term product roadmap” can you give some concrete examples of how CC helped you here?

1zael•5mo ago

Yeah, so I continuously maintain a claude.md file with the feature roadmap for my product (which changes every week but acts as a source of truth). I feed that into a claude software architecture agent that I created, which reviews proposed changes for my current feature build against the longer-term roadmap to ensure I don't 1\ create tech debt with my current approach and 2\ identify opportunities to parallelize work that could help with multiple upcoming features at once.

I have also a code reviewer agent in CC that writes all my unit and integration tests, which feeds into my CI/CD pipeline. I use the "/security" command that Claude recently released to review my code for security vulnerabilities while also leveraging a red team agent that tests my codebase for vulnerabilities to patch.

I'm starting to integrate Claude into Linear so I can assign Linear tickets to Claude to start working on while I tackle core stuff. Hope that helps!

foobarbecue•5mo ago

[flagged]

BeetleB•5mo ago

> I bet you could replace your slop pile with a tenth of the lines of clean code, and chances are it'd be less work than you think.

Actually, no. When LLMs produce good, working code, it also tends to be efficient (in terms of lines, etc).

May vary with language and domain, though.

stavros•5mo ago

Eh, when is that, though? I'm always worrying about the bugs that I haven't noticed if I don't review the changes. The other day, I gave it a four-step algorithm to implement, and it skipped three of the steps because it didn't think they were necessary (they were).

BeetleB•5mo ago

Hmm...

It may be the size of the changes you're asking for. I tend to micromanage it. I don't know your algorithm, but if it's complex enough, I may have done 4 separate prompts - one for each step.

foobarbecue•5mo ago

Isn't it easier to just write the code???

BeetleB•5mo ago

Depends on the algorithm. When you've been coding for a few decades, you really, really don't want to write yet another trivial algorithm you've written multiple tens of times in your life. There's no joy in it.

Let the LLM do the boring stuff, and focus on writing the fun stuff.

Also, setting up logging in Python is never fun.

foobarbecue•5mo ago

Right-- it's only really capable of trivial code and boilerplate, which I usually just copy from one of my older programs, examples in docs, or a highly-ranked recent SO answer. Saves me from having to converse with an expensive chatbot, and I don't have to worry about random hallucinations.

If it's a new, non-trivial algorithm, I enjoy writing it.

BeetleB•5mo ago

For me, it's a lot easier getting the LLM to do it than browsing through multiple SO answers, or even finding some old code of mine.

Oh, and the chatbot is cheap. I pay for API usage. On average I'm paying less than $5 per month.

> and I don't have to worry about random hallucinations.

For boilerplate code, I don't think I've ever had to fix anything. It's always worked the first time. If it didn't, my prompt was at fault.

a5c11•5mo ago

> Also, setting up logging in Python is never fun.

import logging

BeetleB•5mo ago

Not fun at all.

Configuring it to produce useful stuff (e.g. timestamps, autologging exceptions, etc). Very boilerplate and tedious.

stavros•5mo ago

It was really simple, just traversing a list up and down twice. It just didn't see the reason why, so it skipped it all (the reason was to prevent race conditions).

1zael•5mo ago

Congratulations, you replace my pile of "slop" (which really is functional, tight code written by AI in 1/1000th of the time it would take me to write it) with your "shorter" code that has the exact same functionality and performance. Congrats? The reality is no one (except in the case of like competitive programming) cares about the length of your code so long as it's maintainable.

foobarbecue•5mo ago

But that's the thing -- because I WROTE my code, I know there's nothing bizarre in it. Your code is almost guaranteed to have something bizarre and unexpected in it, at least somewhere.

PUSH_AX•5mo ago

Clean code? Fewer lines? Found the intermediate.

dang•5mo ago

This is a belated reply, but you broke the site guidelines badly here. It's not ok to attack other users like this, no matter how right you are or feel you are. We ban accounts that do, so please don't do it again.

It's fine, of course, to make your substantive points thoughtfully, but that is a very different kind of comment.

https://news.ycombinator.com/newsguidelines.html

foobarbecue•5mo ago

Apologies, forgot to check my reaction.

imiric•5mo ago

Well, don't be shy, share what CC helped you build.

orsorna•5mo ago

[flagged]

turnsout•5mo ago

There’s still a stigma. I think people are worried that if it gets out that their startup was built with the help of an LLM, they’ll lose customers who don’t want to pay for something “vibe coded.”

Honestly I don’t think customers care.

mlrtime•5mo ago

I used the analogy to how online dating started. I remember [some] people were embarrassed to say they met online so would make up a story. We're in that phase of AI development, it will pass.

jaggederest•5mo ago

I generally work as openly as possible on github, and I am deliberately avoiding manual coding for a while to try to learn these (infuriating/wonderful) tools more thoroughly.

Unfortunately I can't always share all of my work, but everything on github after perhaps 2025-06-01 is as vibe-coded as I can get it to be. (I manually review commits before they're pushed, and PRs once in a complete state, but I always feed those reviews back into the tooling, not fix them manually, unless I get completely fed up.)

bopbopbop7•5mo ago

Everyone wants to see your very profitable startup, it’s simply free advertising. Why not share it?

jaggederest•5mo ago

I definitely don't have a very profitable startup. I'm simply a working programmer trying to learn new tools.

Workaccount2•5mo ago

I learned 20 years ago to never share code online with programmers when it comes to making a point (or, sadly, asking for help.)

I promise if someone posted human made code and said it was LLM generated, it would still be nit-picked to death. I swear 75% of developers ride around on a high horse that their style of doing things is objectively the best and everyone else is a knuckle dragger.

1zael•5mo ago

Answered above, but to be concrete on features --> it helped me build an end-to-end multi-stage pipeline architecture for video and audio transcription, LLM analysis, content generation, and evals. It took care of stuff like Postgres storage and pgvector for RAG-powered semantic search, background job orchestration with intelligent retry logic, Celery workers for background jobs, and MCP connectors.

dimgl•5mo ago

We get it; it helped you build a bunch of stuff. Why not just post your company?

1zael•5mo ago

We're in the govtech x AI space (building software for local governments and government-adjacent customers). I don't feel comfortable linking my direct startup yet - it serves me no benefit here (I just get judged by a bunch of angry programmers) and we're in the middle of fundraising with investors.

bopbopbop7•5mo ago

Then why bring it up in the first place if you’re not even willing to show one shred of evidence of your vibe coding output, even a link to you companies landing page?

lifestyleguru•5mo ago

duh, I ordered Claude Code to simply transfer money monthly to my bank account and it does.

ComputerGuru•5mo ago

> but until then I'm constantly leveraging Claude for fixing security vulnerabilities

That it authored in the first place?

dpe82•5mo ago

Do you ever fix your own bugs?

ComputerGuru•5mo ago

Bugs, yes. Security vulnerabilities? Rarely enough that it wouldn’t make my HN list. It’s not remotely hard to avoid the most common issues.

fluidcruft•5mo ago

Granted I've only been using Claude for a short time, but in my experience it tends to write code that matches the style of the code it's editing. Which is sort of a "no duh" thing in retrospect but I hadn't considered that. That is to say old crappy code I wrote long ago when I was dumber gets old-crappy dumber code style suggestions from Claude. And newer better-practices code that is very careful gets Claude to follow that style as well.

It wasn't something I considered at first but it makes sense if you think about text prediction models and infilling and training by reading code. The statistics of style matching what you are doing against similar things. You're not going to paint a photorealistic chunk into a hole of an impressionist painting, ya know?

So in my experience if you give it "code that avoids the common issues" that works like a style it will follow. But if you're working with a codebase that looks like it doesn't "avoid those common issues" I would expect it to follow suit and suggest code that you would expect from codebases that don't "avoid those common issues". If the input code looks like crappy code, I would expect it to statistically predict output code that looks like crappy code. And I'm not talking about formatting (formatting is for formatters), it's things like which functions and steps are used to accomplish whatever. That sort of thing. At least without some sort of specific prompting it's not going to jump streams.

Edit: one amusing thing you can do is ask Claude to predict attributes of the developers of the code and their priorities and development philosophy (i.e. ask Claude to write a README that includes these cultural things). I have a theory it gives you an idea about the overall codesmell Claude is assigning to the project.

Again I am very new to these tools and have only used claude-code because the command line interface and workflow didn't make me immediately run for the hills the way other things have. So no idea how other systems work, etc because I immediately bounced on them in the past. My use of claude-code started as an "okay fine why not give these things the young guns can't shut up about a shot on the boring shit and maybe clear out some backlog" for making chores in projects that I usually hate doing at least a little interesting but I've expanded my use significantly after gaining experience with it. But I have noticed it behave very differently in different code bases and the above is how I currently interpret that.

ComputerGuru•5mo ago

Thanks for sharing; that sounds rather reasonable. But I was under the impression that this new "vibe coding" thing was where you start with a "clean slate" altogether (the llm itself generates/picks the "initial state" in terms of idiomatic or not-so-idiomatic handling of whatever conditions rather than copying it from existing code)?

fluidcruft•5mo ago

I haven't tried any of that sort of thing yet... but I would expect the prompt to probably colors expectations.

Overall "meta" commands seem to work much more effectively that I expected. I'm still getting used to it and letting it run more freely lately but there's some sort of a loop you can watch as it runs where it will propose code given logic that is dumb and makes you want to stop it and intervene... but on the next step it evaluates what it just wrote and rejects for the same reason I would have rejected it and then tries something else. It's somewhat interesting to watch.

If you asked a new "I need you to write XYZ stat!" vs "We care a lot about security, maintainability and best practices. Create a project that XYZ." you would expect different product from the new hire. At least that's how I am treating it.

Basically I would give it a sort of job description. And you can even do things like pick a project you like as a model and have it write a file describing development practices used in that project. Then in the new project ask it to refer to that file as guidance and design a plan for writing the program. And then let it implement that plan. That would probably give a good scaffold, but I haven't tried. It seems like how I would approach that right now as an experiment. It's all speculation but I can see how it might work.

Maybe I'll get there and try that, but at the moment I'm just doing things I have wanted to do forever but that represented massive amounts of my time that I couldn't justify. I'm still learning to trust it and my projects are not large. Also I am not primarily a programmer (physicist who builds integrations, new workflows and tools for qc and data handling at a hospital).

janice1999•5mo ago

Humans have the capacity to learn from their own mistakes without redoing a lifetime of education.

davepeck•5mo ago

> I've literally built the entire MVP of my startup on Claude Code and now have paying customers.

Would you mind linking to your startup? I’m genuinely curious to see it.

(I won’t reply back with opinions about it. I just want to know what people are actually building with these tools!)

jaggederest•5mo ago

My github has examples of work I've done recently that are open source.

I'm deliberately trying not to do too much manual coding right now so I can figure out these (infuriating/wonderful) tools.

davepeck•5mo ago

Thanks, I’ll take a look. Everyone uses these tools differently, so I find AI-generated repos (and AI live-coding streams) to be useful learning material.

FWIW: “Infuriating/wonderful” is exactly how I feel about LLM copilots, too! Like you, I also use them extensively. But nothing I’ve built (yet?) has crossed the threshold into salable web services and every time someone makes the claim that they’ve primarily used AI to launch a new business with paid customers, links are curiously absent from the discussion… too bad, since they’d be great learning material too!

jaggederest•5mo ago

I will have one for you, most likely, later this week! Fingers crossed anyway!

bopbopbop7•5mo ago

He won’t, everyone that says they made a profitable startup with some AI code generator 3000 never seems to link their startup. Interesting.

davepeck•5mo ago

There are many reasons that "I used AI to do it all and now I've got $REAL ARR" strikes me as unlikely. To name just two:

1. I code with LLMs (Copilot, Claude Code). Like anyone who has done so, I know a lot about where these tools are useful and where they're hopeless. They can't do it all, claims to the contrary aside.

2. I've built a couple businesses (and failed tragicomically at building a couple more). Like anyone who has done so, I know the hard parts of startups are rarely the tech itself: sales, marketing, building a team with values, actually listening to customers and responding to their needs, making forward progress in a sea of uncertainty, getting anyone to care at all... sheesh, those are hard! Last I checked, AI doesn't singlehandedly solve any of that.

Which is not to say LLMs are useless; on the contrary, used well and aimed at the right tasks, my experience is that they can be real accelerants. They've undoubtedly changed the way I approach my own new projects. But "LLMs did it all and I've got a profitable startup"... I mean, if that's true, link to it because we should all be celebrating the achievement.

1zael•5mo ago

conception•5mo ago

Ive seen context forge has a way to use hooks to keep CC going after context condensing. Are there any other patterns or tools people are using with CC to keep it on task, with current context until it has a validated completion of its task? I feel like we have all these tools separately but nothing brings it all together and also isn’t crazy buggy.

kroaton•5mo ago

Load up the context with your information + task list (broken down into phases). Have Sonnet implement phase one tasks and mark phase 1 as done. Go into planning mode, have Opus review the work (you should ideally also review it at this point). Double press escape and go back to the point in the conversation where you loaded up the context with your information + task list. Tell it to do phase 2. Repeat until you run out of usage.

kroaton•5mo ago

From time to time, go into Opus planning mode, have it review your entire codebase and tell it to go file by file and look for bugs, security issues, logical problems, etc. Have it make a list. Then load up the context + task list...

conception•5mo ago

Yes, i can manage CC through a task list but there’s nothing technically stopping all your steps from happening automatically. That tool just doesn’t exist yet as far as I can tell but it’s not a very advanced tool to build. I’m surprised no one has put those steps together.

Also if the task runs out of context it will get progressively worse rather than refresh its own context from time to time.

rolls-reus•5mo ago

What’s context forge?

conception•5mo ago

https://github.com/webdevtodayjason/context-forge

whoknowsidont•5mo ago

It's not that good, most developers are just really that subpar lol.

roflyear•5mo ago

Claude Code is hilarious because often it'll say stuff that's basically "that's too hard, here's a bandaid fix" and implement it lol

ahmedhawas123•5mo ago

Thanks for sharing this. At a time where this is a rush towards multi-agent systems, this is helpful to see how an LLM-first organization is going after it. Lots of the design aspects here are things I experiment with day to day so it's good to see others use it as well

A few takeaways for me from this (1) Long prompts are good - and don't forget basic things like explaining in the prompt what the tool is, how to help the user, etc (2) Tool calling is basic af; you need more context (when to use, when not to use, etc) (3) Using messages as the state of the memory for the system is OK; i've thought about fancy ways (e.g., persisting dataframes, parsing variables between steps, etc, but seems like as context windows grow, messages should be ok)

nuwandavek•5mo ago

(author of the blogpost here) Yeah, you can extract a LOT of performance from the basics and don't have to do any complicated setup for ~99% of use cases. Keep the loop simple, have clear tools (it is ok if tools overlap in function). Clarity and simplicity >>> everything else.

samuelstros•5mo ago

does a framework like vercel's ai sdk help, or is handling the loop + tool calling so straightforward that a framework is overcomplicating things?

for context, i want to build a claude code like agent in a WYSIWYG markdown app. that's how i stumbled on your blog post :)

ahmedhawas123•5mo ago

Function / tool calling is actually super simple. I'd honestly recommend either doing it through a single LLM provider (e.g., OpenAI or Gemini) without a hard framework first, and then moving to one of the simpler frameworks if you feel the need to (e.g., LangChain). Frameworks like LangGraph and others can get really complicated really quickly.

nuwandavek•5mo ago

There may be other reasons to use ai sdk, but I'd highly recommend starting with a simple loop + port most relevant tools from Claude Code before using any framework.

Nice, do share a link, would love to check out your agent!

brabel•5mo ago

Check the OpenAI REST API reference. Most engines implement that and you can see how tool calls work. It’s just a matter of understanding the responses they give you, how to put them in the messages history and how to invoke a tool when the LLM asks for it.

chazeon•5mo ago

I want to note that: long prompts are good only if the model is optimized for it. I have tried to swap the underlying model for Claude Code. Most local models, even those claimed to work with long context and tool use, don't work well when instruction becomes too long. This has become an issue for tool use, where tool use works well in small ChatBot-type conversation demos, but when Claude's code-level prompt length increases, it just fails, either forgetting what tools are there, forgetting to use them, or returning in the wrong formats. Only the model by OpenAI, Google's Gemini, kind of works, but not as well as Anthropic's own models. Besides they feel much slower.

marmalade2413•5mo ago

I would be remis if after reading this I didn't point people towards talk box ( https://github.com/rich-iannone/talk-box) from one of the creators of great tables.

erelong•5mo ago

> The main takeaway, again, is to keep things simple.

if true this seems like a bloated approach but tbh I wouldn't claim to know totally how to use Claude like the author here...

I find you can get a lot of mileage out of "regular" prompts, I'd call them?

Just asking for what you need one prompt at a time?

I still can't visualize how any of the complexity on top of that like discussed in the article adds anything to carefully crafted prompts one at a time

I also still can't really visualize how claude works compared to simple prompts one at a time.

Like, wouldn't it be more efficient to generate a prompt and then check it by looping through the appendix sections ("Main Claude Code System Prompt" and "All Claude Code Tools"), or is that basically what the LLM does somewhat mysteriously (it just works)? So like "give me while loop equivalent in [new language I'm learning]" is the entirety of the prompt... then if you need to you can loop through the appendix section? Otherwise isn't that a massive over-use of tokens, and the requests might even be ignored because they're too complex?

The control flow eludes me a bit here. I otherwise get the impression that the LLM does not use the appendix sections correctly by adding them to prompts (like, couldn't it just ignore them at times)? It would seem like you'd get more accurate responses by separating that from whatever you're prompting and then checking the prompt through looping over the appendix sections.

Does that make any sense?

I'm visualizing coding an entire program as prompting discrete pieces of it. I have not needed elaborate .md files to do that, you just ask for "how to do a while loop equivalent in [new language I'm learning]" for example. It's possible my prompts are much simpler for my uses, but I still haven't seen any write-ups on how people are constructing elaborate programs in some other way.

Like how are people stringing prompts together to create whole programs? (I guess is one question I have that comes to mind)

I guess maybe I need to find a prompt-by-prompt breakdown of some people building things to get a clearer picture of how LLMs are being used

zackify•5mo ago

How you see and use it is the same way I do. So interested to hear other replies

zackify•5mo ago

Wow. Auto correct. I meant “interested”

erelong•5mo ago

thanks, that's helpful to hear

I know this is a new "space" so I've just been going off what I can find on here and other places and...

it all seems a little confusing to me besides what I otherwise tried to describe (and which apparently resonates with you, which is good to see)

ftyuiiooool•5mo ago

Qetyuuioooooddfhj

itbeho•5mo ago

I use Claude code with Elixir and Phoenix. It's been mostly great but after a short time into a project it seems to break something unrelated to the task at hand.

mike1o1•5mo ago

If you haven’t yet, you should try out usage_rules mix package. I mostly use Ash, which has great support for usage rules and it’s a night and day difference in effectiveness. Tidewave is also really nice as an MCP as it lets the agent query hexdocs or your schema directly.

https://hexdocs.pm/usage_rules/readme.html

itbeho•5mo ago

Thank you! I'll definitely check that out.

arcanemachiner•5mo ago

Also check out the AGENTS.MD file that's been added to Phoenix 1.8.

Make sure you read it first though... I believe it expected Req to be present as a dependency when generating code that makes HTTP requests.

system2•5mo ago

As expected, many graybeard gatekeepers are telling others not to use LLM for any type of coding or assistance.

kristianp•5mo ago

Just fyi, at the end of the article there is a link to minusx.com which has an expired certificate.

This server could not prove that it is minusx.com; its security certificate expired 553 days ago

nuwandavek•5mo ago

Oops, fixed it, thanks!

BobSonOfBob•5mo ago

KISS always win. Great breakdown article. Thanks!

brokegrammer•5mo ago

I don't get it. The title says "What makes Claude Code so damn good", which implies that they will show how Claude Code is better than other tools, or just better in general. But they go about repeating the Claude Code documentation using different wording.

Am I missing something here? Or is this just Anthropic shilling?

nuwandavek•5mo ago

(blogpost author here) Haha, that's totally fair. I've read a whole bunch of posts comparing CC to other tools, or with a dump of the the architecture. This post was mainly for people who've used CC extensively, know for a fact that it is better and wonder how to ship such an experience in their own apps.

brokegrammer•5mo ago

I've used Claude Code, Cursor, and Copilot is Vscode and I don't "know" that Claude Code is better apart from the fact that it runs in the terminal, which makes it a little faster but less ergonomic than tools running inside the editor. All of the context tricks can be done with Copilot instructions as well, so I simply can't see how Claude Code is superior.

techwiz137•5mo ago

For code generation, nothing so far beats Opus. More likely than not it generated working code and fixed bugs that Gemini 2.5 pro couldn't solve or even Gemini Code Assist. Gemini Code Assist is better than 2.5 pro, but has way more limits per prompt and often truncates output.

baq•5mo ago

I found Anthropic’s models untrustworthy with SQL (e.g. confused AND and OR operator precedence - or simply forgot to add parens, multiple times), Gemini 2.5 pro has no such issues and identified Claude’s mistakes correctly.

jonasft•5mo ago

Let’s say that is correct, you can still just use Opus in Cursor or whatever.

rendx•5mo ago

The article is not comparing models, but how the models are used by tools, in this case Claude Code. It's not merely a thin wrapper around an API.

faangguyindia•5mo ago

for me gemini 2.5 pro with thinking tokens enabled blows Opus out of the water for "difficult problems".

d4rkp4ttern•5mo ago

Don’t sleep on Codex-CLI + gpt-5. While the Codex-CLI scaffolding is far behind CC, the gpt-5 code seems solid from what I’ve seen (you can adjust thinking level using /model).

brookst•5mo ago

I’ve been so into Claude code that I haven’t used cursor or copilot in vs code in a while.

Do they also allow you to view the thinking process and planning, and hit ESC to correct if it’s going down a wrong path? I’ve found that to be one of my favorite features of Claude code. If it says “ah, the the implementation isn’t complete, I’ll update test to use mocks” I can interrupt it and say no, it’s fine for the test to fail until the implementation is finished, so not mock anything. Etc.

It may be that I just discovered this after switching, but I don’t recall that being an interaction pattern on cursor or copilot. I was always having to revert after the fact (which might have been me not seeing the option).

WithinReason•5mo ago

you can in VScode for about a month now

wrs•5mo ago

Cursor does show the “thinking” in smaller greyer text, then hides it behind a small grey “thought for 30 seconds” note. If it’s off track, you just hit the stop button and correct the agent, or scroll up and restart from an earlier interaction (same thing as double-ESC in Claude Code).

whazor•5mo ago

I think this article is targeted towards readers who subjectively agree that Claude Code is the best.

slimebot80•5mo ago

Nowhere in the title does it compare to other tools? Just that's it's damn good.

dotancohen•5mo ago

The phrase "so damn good" implies a benchmark, which itself is implied to be the average of comparable tools.

Without these premises, one could state that the 1996 Yugo was so damn good. I mean, it was better than a horse.

SpaceNoodled•5mo ago

I dunno, horses are pretty great.

dotancohen•5mo ago

Sure, but the Yugo had the power of 45 of them, and didn't leave dung on the city streets. ))

PessimalDecimal•5mo ago

What makes horses so damn good?

psychoslave•5mo ago

The sausage obviously.

escapecharacter•5mo ago

nowhere in your comment do you compare them to anime cat buses though

mxmilkiib•5mo ago

best of all the animals

patates•5mo ago

not in the title but, one of the opening sentences is this:

> I find Claude Code objectively less annoying to use compared to Cursor, or Github Copilot agents even with the same underlying model! What makes it so damn good?

dtagames•5mo ago

The difference between Claude Code and Cursor is that one is a command line tool and the other an IDE. You can use Claude models in both and all these techniques can be applied with Cursor and its rules, too.

It's Coke vs. Pepsi.

kissgyorgy•5mo ago

Not even close. An agentic tool can be fully autonomous, an IDE like Cursor is, well it's "just" an editor. Quite the opposite. Sure it does some heavy lifting too, but still the user writes the code. They start to implement fully agentic tools and models, but they are nowhere near work as good as Claude Code does.

willsmith72•5mo ago

not at all, it's just not a "claude model". All these companies add their own prompts hints on top. it's a totally different experience. Trying using kiro which is also a "claude model" and tell me it's the same

tomashubelbauer•5mo ago

There is also Cursor Agent CLI which is a TUI exactly like CC. I switched to it because I don't like GUI AI assistants, but I also couldn't stand CC always being overloaded and having many bugs that were affecting me. I'm not on Cursor Agent CLI with GPT5 and happy to have an alternative to CC.

rapind•5mo ago

How is Cursor Agent + GPT5 holding up? I've been on Claude code for a while, but there's been an increase in timeouts and slowdowns recently.

tomashubelbauer•5mo ago

For my personal projects it has completely replaced the need for CC with Anthropic models for me. At work, I am waiting for native Windows support. I don't like using AI assistants via WSL. Since both CA and CC are Node apps and CC has since shipped native Windows support, I don't foresee it taking CA long either. Especially since it can be hacked to work that way today as I've experimented with here: github.com/TomasHubelbauer/cursor-agent-windows

rapind•5mo ago

Good to know. I’ll try it out on an upcoming greenfield. Thanks.

jofla_net•5mo ago

Its hype, the answer is hype. Please buy a slot.

Can i shill my business on here too or will it get canned because i'm a nobody?

nojs•5mo ago

I’ve noticed that custom subagents in CC often perform noticeably worse than the main agent, even when told to use Opus and despite extreme prompt tuning. This seems to concur with the “keep it flat” logic here. But why should this be the case?

nuwandavek•5mo ago

(blogpost author here) I've noticed this too. My top guess for any such thing would be that this type of sub-agent routing is outside the training distribution. Its possible that this gets better overnight with a model update. The second reason is that sub-agents make it very hard to debug - was the issue with the router prompt or the agent prompt? Flat tools and loop make this a non-issue without loss of any real capability.

whazor•5mo ago

I think the key success to the success of Claude Code is unix.

Claude can run commands to search code, test compilation, and perform various other operations.

Unix is great because its commands are well-documented, and the training data is abundant with examples.

revskill•5mo ago

Smart tool use.

gauravvppnd•5mo ago

Honestly, Claude’s code feels so good because it’s clean, logical, and easy to follow. It doesn’t just work—it makes sense when you read it, which saves a ton of time when debugging or building on top of it.

0xpgm•5mo ago

So, what great new products or startups have these amazing coding agents helped create so far (and not on the AI supply side).

Anywhere to check?

anonzzzies•5mo ago

You really should not check that... I saw some dude on reddit saying that you can build your own saas in 20 days and launch and sell it. I checked out some of his; Claude Code can do that in a few hours. So can I without AI as I have a batteries included framework ready that has all the plumbing done. But Claude can do those from scratch in hours. So 1 day with me doing some testing and fixing. That is not a product or a startup: it's a grift. But glory to him for getting it done anyway. Not many people launch and then actually make a few bucks.

noduerme•5mo ago

>> launch and sell it

What AI can definitely not do is launch or sell anything.

I can write some arbitrary SaaS in a few hours with my own framework, too - and know it's much more secure than anything written by AI. I also know how to launch it. (I'm not so good at the "selling" part).

But if anyone can do all of this - including the launching the selling - then they would not be selling themselves on Reddit or Youtube. Once you see someone explaining to you how to get rich quickly, you must assume that they have failed or else they would not be wasting their time trying to sell you something. And from that you should deduce that it's not wise to take their advice.

anonzzzies•5mo ago

> What AI can definitely not do is launch or sell anything.

Sure but he was particularly talking about the technical side of things.

> (I'm not so good at the "selling" part).

In person I am, but this new fangled 'influencer' selling or what not I do not understand and cannot do (yet) (i'm in my 50s so I can still learn).

> But if anyone can do all of this - including the launching the selling - then they would not be selling themselves on Reddit or Youtube

Yeah but most don't actually name the url of the product and he does. So that's a difference.

willsmith72•5mo ago

literally every single startup in the past year is helped by these. Of course you haven't heard of them, they're year-old startups

bopbopbop7•5mo ago

Name one non failing startup then.

anonzzzies•5mo ago

What's the best current cli (with a non interactive option) that is on par with Claude code but can work with other llms like ollama, openrouter etc? I tried stuff like aider but it cannot discover files, the open source gemini one but it was terrible; what is a good one that maybe is the same as CC if you plug in Opus?

elbear•5mo ago

See if this is it. I haven't used it yet, just know about it: https://github.com/block/goose

faangguyindia•5mo ago

I am curious if any good existing solution exist for this tool:

`Tool name: WebFetch Tool description: - Fetches content from a specified URL and processes it using an AI model - Takes a URL and a prompt as input - Fetches the URL content, converts HTML to markdown - Processes the content with the prompt using a small, fast model - Returns the model's response about the content - Use this tool when you need to retrieve and analyze web content`

I came up with this one:

`import asyncio from playwright.async_api import async_playwright from readability import Document from markdownify import markdownify as md

async def web_fetch_robust(url: str, prompt: str) -> str: """ Fetches content from a URL using a headless browser to handle JS-heavy sites, processes it, and returns a summary. """ try: async with async_playwright() as p: # Launch a headless browser (Chromium is a good default) browser = await p.chromium.launch() page = await browser.new_page()

            # --- Avoiding Blocks ---
            # Set a realistic User-Agent to mimic a real browser
            await page.set_extra_http_headers({
                'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
            })

            # Navigate to the URL
            await page.goto(url, wait_until='networkidle', timeout=15000) # wait_until='networkidle' is key

            # --- Extracting Content ---
            # Get the fully rendered HTML content
            html_content = await page.content()
            await browser.close()

            # --- Processing for Token Minimization ---
            # 1. Extract main content using Readability.js
            doc = Document(html_content)
            main_content_html = doc.summary()

            # 2. Convert to clean Markdown
            markdown_content = md(main_content_html, strip=['a', 'img']) # Strip links/images to save tokens

            # 3. Use the small, fast model to process the clean content
            # summary = small_model.process(prompt, markdown_content) # Placeholder for your model call

            # For demonstration, we'll just return a message
            summary = f"A summary of the JS-rendered content from {url} would be generated here."

            return summary

    except Exception as e:
        return f"Error fetching or processing URL with headless browser: {e}"

# To run this async function # result = asyncio.run(web_fetch_robust("https://example.com", "Summarize this.")) # print(result) `

noduerme•5mo ago

Claude Code has definitely attracted me as in, I would like to try it on a new project. But just speaking as a lone coder, it absolutely terrifies me to give something access to my whole system and CLI. I have one main laptop and everything is on it. All my repos and API keys and SSH keys, my carefully tuned dev environment...I have no idea what it might read or upload, let alone what it might try to execute. I'm tempted enough to try it that I might set up a completely walled-off virtual machine for the purpose, but then I don't know how much benefit I'd get from it.

Do you just let it run rampant on your system and do whatever it thinks it should, installing whatever it wants and sucking all your config files into the cloud or what?

furyofantares•5mo ago

By default you have to approve every command it runs. I think most people end up allowing certain tools through unconditionally, like grep, but which is technical not bullet proof but feels pretty safe. The agent program also has some guardrails to prevent the model from working outside of the working directory you launched it from, that is also not bulletproof but in practice works pretty well.

You could set up a docker image and run it in that if you wanted.

12ian34•5mo ago

claude code is a nightmare compared to cursor. terminal is not an appropriate UX unless you want to do stuff from your phone in a pinch. the main thing they got right is selling the idea of vibing to skeptical engineers by making it a CLI. i think it has more sensible defaults than cursor though which is another reason folks like it out of the box. cursor with a planner/executor system prompt works much nicer and is way less destructive. cc more for vibing IMO

rob_c•5mo ago

Context window length with further fine tuning and a good system prompt... Nothing too much different other than anthro correctly banked on larger context windows (and stability with training/eval) is the key to massive improvements.

lightedman•5mo ago

And it still fails at basic HTML.

Go back to school Anthropic.

ath3nd•5mo ago

The hype from hype fanboys? /s

In all seriousness, at the times of LLMs I am not surprised to see an article that can basically be summarized into: "This product is good because it's good and I am not gonna compare it to others because why do you expect critical thinking in the era of LLMs"

Danborg•5mo ago

The reliance on “IMPORTANT” and “NEVER” tags feels like a necessary evil that points to current model limitations. It works, but it’s not elegant. I’m curious how this will evolve as models become more steerable.

AceJohnny2•5mo ago

late stupid question, perhaps, but is there any meaning to emphasis in the prompts? Like, FTA:

> - IMPORTANT: DO NOT ADD ***ANY*** COMMENTS unless asked

> - VERY IMPORTANT: You MUST avoid using search commands like `find` and `grep`.

Does using caps, or the stars, really carry meaning through the tokenization process?

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Haskell for all: Beyond agentic coding

SectorC: A C Compiler in 512 bytes (2023)

Software factories and the agentic moment

LLMs as the new high level language

Speed up responses with fast mode

The Architecture of Open Source Applications (Volume 1) Berkeley DB

LineageOS 23.2

Hoot: Scheme on WebAssembly

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Stories from 25 Years of Software Development

Vocal Guide – belt sing without killing yourself

Roger Ebert Reviews "The Shawshank Redemption"

Substack confirms data breach affects users’ email addresses and phone numbers

uLauncher

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

First Proof

Wood Gas Vehicles: Firewood in the Fuel Tank (2010)

Vouch

Start all of your commands with a comma (2009)

Al Lowe on model trains, funny deaths and working with Disney

Show HN: A luma dependent chroma compression algorithm (image compression)

The AI boom is causing shortages everywhere else

The Scriptovision Super Micro Script video titler is almost a home computer

Where did all the starships go?

FDA intends to take action against non-FDA-approved GLP-1 drugs

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Selection rather than prediction

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Haskell for all: Beyond agentic coding

SectorC: A C Compiler in 512 bytes (2023)

Software factories and the agentic moment

LLMs as the new high level language

Speed up responses with fast mode

The Architecture of Open Source Applications (Volume 1) Berkeley DB

LineageOS 23.2

Hoot: Scheme on WebAssembly

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Stories from 25 Years of Software Development

Vocal Guide – belt sing without killing yourself

Roger Ebert Reviews "The Shawshank Redemption"

Substack confirms data breach affects users’ email addresses and phone numbers

uLauncher

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

First Proof

Wood Gas Vehicles: Firewood in the Fuel Tank (2010)

Vouch

Start all of your commands with a comma (2009)

Al Lowe on model trains, funny deaths and working with Disney

Show HN: A luma dependent chroma compression algorithm (image compression)

The AI boom is causing shortages everywhere else

The Scriptovision Super Micro Script video titler is almost a home computer

Where did all the starships go?

FDA intends to take action against non-FDA-approved GLP-1 drugs

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Selection rather than prediction

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Unseen Footage of Atari Battlezone Arcade Cabinet Production

What makes Claude Code so damn good

Comments