frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

The problems that accountability can't fix

https://surfingcomplexity.blog/2025/08/23/the-problems-that-accountability-cant-fix/
1•gpi•4m ago•0 comments

Florida Paints over Rainbow Memorial for Victims of Pulse Nightclub Shooting

https://www.nytimes.com/2025/08/23/us/politics/orlando-pulse-shooting-rainbow-crosswalk-memorial....
1•duxup•8m ago•0 comments

Will we ever get enough housing? The future holds promise

https://www.latimes.com/california/story/2025-08-10/home-2050
1•PaulHoule•12m ago•0 comments

Silicon Valley is full of wealthy men who think they're victims, says Nick Clegg

https://www.theguardian.com/politics/2025/aug/23/nick-clegg-silicon-valley-self-pity-wealthy-men
1•breve•14m ago•1 comments

Jo Nesbø

https://en.wikipedia.org/wiki/Jo_Nesb%C3%B8
1•wslh•17m ago•0 comments

Ask HN: Do you find ChatGPT 5 to be condescending?

1•amichail•25m ago•1 comments

3D printing a building with 756 windows

https://jero.zone/posts/cbr-building
2•jer0me•26m ago•0 comments

Why did Joan Didion abandon her book about the Manson murders?

https://www.washingtonpost.com/books/2025/08/16/joan-didion-archives-manson-murders/
1•samclemens•28m ago•0 comments

Discovery of Strong Water Ice Absorption, Extended Carbon Dioxide Coma-3I/Atlas

https://arxiv.org/abs/2508.15469
1•bikenaga•30m ago•0 comments

Quantum computing breakthrough at room temperature

https://news.ssbcrack.com/new-breakthroughs-in-physics-based-computing-could-efficiently-solve-co...
1•nitia•33m ago•1 comments

Churches use data and AI as engines of surveillance

https://www.technologyreview.com/2025/08/19/1121389/ai-data-church-surveillance-america/
6•gnabgib•35m ago•1 comments

Chipotle restaurant chain is testing drone delivery starting in Dallas

https://dallas.culturemap.com/news/restaurants-bars/chipotle-zipline-drone/
1•bookofjoe•36m ago•1 comments

3I/Atlas: Direct Spacecraft Exploration of Possible Relic of Planetary Formation

https://arxiv.org/abs/2508.15768
1•bikenaga•38m ago•0 comments

Codeku: A lightweight, embeddable code execution widget for the web

https://github.com/alvii147/codeku
1•alvii147•41m ago•0 comments

Weijian Shan Went from Hard Labor to Private Equity Pioneer

https://www.bloomberg.com/features/2025-weijian-shan-weekend-interview/
2•wslh•43m ago•1 comments

Mystery object lights up night skies across western Japan

https://mainichi.jp/english/articles/20250820/p2g/00m/0sc/031000c
1•giuliomagnifico•46m ago•1 comments

Show HN: ApplyBoop – No sign up free job application tracker

https://applyboop.com
1•Yuvehu•50m ago•0 comments

The West is bored to death

https://www.newstatesman.com/ideas/2025/04/the-west-is-bored-to-death
6•CharlesW•1h ago•0 comments

The 1970s Gave Us Industrial Decline. A.I. Could Bring Something Worse.

https://www.nytimes.com/2025/08/19/opinion/ai-job-loss-deindustrialization.html
8•goplayoutside•1h ago•1 comments

Is Google behind a mysterious new AI image generator?

https://www.businessinsider.com/bananas-google-viral-ai-model-2025-8
3•ivape•1h ago•2 comments

People Beg Developers "Don't Grant AI Sentience, Please."

https://totalapexgaming.com/tech/ai-fear-may-be-real/
2•juanviera23•1h ago•1 comments

Actually Good Regulations

https://www.actuallygoodregulations.eu/
2•saubeidl•1h ago•0 comments

A Decentralised,Tamper‑Proof Electronic Voting System

https://www.jionex.com/blog/bd-votenet-a-fully-decentralised-tamper-proof-electronic-voting-syste...
2•omar-marosh•1h ago•5 comments

Today's vehicles have bigger blind spots but not where you think

https://newatlas.com/automotive/iihs-vehicle-visibility-study/
8•domofutu•1h ago•3 comments

An Easy Way to Capture Completions Data for Fine Tuning

https://github.com/GridLLM/MicroModel
1•jwstanwick03•1h ago•0 comments

The Carbon Cycle(2011)

https://www.earthobservatory.nasa.gov/features/CarbonCycle
2•rolph•1h ago•0 comments

Interruptions cost 23 minutes 15 seconds, right?

https://blog.oberien.de/2023/11/05/23-minutes-15-seconds.html
3•_vaporwave_•1h ago•0 comments

Psilocybin and the dynamics of gaze fixations during visual aesthetic perception

https://www.nature.com/articles/s41598-025-10206-8
1•domofutu•1h ago•1 comments

Prior-authorization denial letter received shortly before surgery

https://www.nytimes.com/2025/08/22/your-money/insurance-prior-authorization-surgery-unitedhealthc...
2•CaliforniaKarl•1h ago•0 comments

Why WADA Has Its Eye on Ozempic

https://www.triathlete.com/culture/news/does-ozempic-affect-athletic-performance-wada-to-study-do...
1•austinallegro•1h ago•0 comments
Open in hackernews

What makes Claude Code so damn good

https://minusx.ai/blog/decoding-claude-code/
143•samuelstros•3h ago

Comments

LaGrange•3h ago
[flagged]
dang•3h ago
Please don't post unsubstantive comments to Hacker News, and especially not putdowns.

The idea here is: if you have a substantive point, make it thoughtfully. If not, please don't comment until you do.

https://news.ycombinator.com/newsguidelines.html

dingnuts•3h ago
I appreciate the vague negative takes on tools like this where it feels like there is so much hype it's impossible to have a different opinion. "It's bad" is perfectly substantiative in my opinion; this person tried it, didn't like it, and doesn't have much more to say because of that, but it's still a useful perspective.

Is this why HN is so dang pro-AI? the negative comments, even small ones, are moderated away? explains a lot TBH

h4ch1•2h ago
I think this comment would be a little better by specifying WHY it's bad instead of just a "it's bad" like it's a Twitter thread.
LaGrange•2h ago
The subject is pretty exhausted. The reason I post "it's bad" because, honestly, expending on it just feels like a waste of time and energy. The point is demonstrating that this _isn't_ a consensus, and not much more than that.

Edit: bonus points if this gets me banned.

exe34•2h ago
that wasn't a negative comment though. a negative comment would explain what they didn't like about it. this was the digital equivalent of flytipping.
danielbln•2h ago
There is no value in a single poster saying "it's bad". I don't know this person, there is zero context on why I should care that this user thinks it's bad. Unless they state why they think it's bad, it adds nothing to the conversation and is just noise
dingnuts•3h ago
the article says CC doesn't use RAG, but then describes how it uses tools to Retrieve context to Aid Generation... RAG

what am I missing here?

edit: lol I "love" that I got downvoted for asking a simple question that might have an open answer. "be curious" says the rules. stay classy HN

ebzlo•3h ago
Yes technically it is RAG, but a lot of the community is associating RAG with vector search specifically.
dingnuts•2h ago
it does? why? the term RAG as I understand it leaves the methodology for retrieval vague so that different techniques can be used depending on the, er, context.. which makes a lot more sense to me
koakuma-chan•2h ago
> why?

Hype. There's nothing wrong with using, e.g., full-text search for RAG.

BoorishBears•2h ago
If you want to be really stringent, RAG originally referred to going from user query to retrieving information directly based on the query then passing it to an LLM: With CC the LLM is taking the raw user query then crafting its own searches

But realistically lots of RAG systems have LLM calls interleaved for various reasons, so what they probably mean it not doing the usual chunking + embeddings thing.

theptip•2h ago
Yeah, TFA clearly explains their point. They mean RAG=vector search, and contrast this with tool calling (eg Grep).
alex1138•3h ago
What do people think of Google's Gemini (Pro?) compared to Claude for code?

I really like a lot of what Google produces, but they can't seem to keep a product that they don't shut down and they can be pretty ham-fisted, both with corporate control (Chrome and corrupt practices) and censorship

KaoruAoiShiho•3h ago
It sucks.
KaoruAoiShiho•40m ago
Lol downvoted, come on anyone who has used gemini and claude code knows there's no comparison... gimme a break.
bitpush•30m ago
You're getting down voted because of the curt "it sucks" which shows a level of shallowness in your understanding.

Nothing in the world is simply outright garbage. Even the seemingly worst products exist for a reason and is used for a variety of use cases.

So, take a step back and reevaluate whether your reply could have been better. Because, it simply "just sucks"

ezfe•3h ago
Gemini frequently didn't write code for me for no explicable reason, and just talked about a hypothetical solution. Seems like a tooling issue though.
djmips•3h ago
Sounds almost human!
yomismoaqui•3h ago
According to the guys from Amp Claude Sonnet/Opus are better at tool use.
jsight•2h ago
For the web ui (chat)? I actually really like gemini 2.5 pro.

For the command line tool (claude code vs gemini code)? It isn't even close. Gemini code was useless. Claude code was mostly just slow.

Herring•18m ago
Yeah I was also getting much better results on the Gemini web ui compared to the Gemini terminal. Haven't gotten to Claude yet.
stabbles•2h ago
In my experience it's better at lower level stuff, like systems programming. A pass afterwards with claude makes the code more readable.
Keyframe•2h ago
It's doing rather well at thinking, but not at coding. When it codes, often enough it runs in circles and ignores input. Where I find it useful is to read through larger codebases and distill what I need to find out from it. Even using gemini from claude to consult it for certain things. Opus is also like that btw, but a bit better at coding. Sonnet though, excels at coding.. from my experience though.
koakuma-chan•2h ago
I don't think Gemini Pro is necessarily worse at coding, but in my experience Claude is substantially better at "terminal" tasks (i.e. working with the model through a CLI in the terminal) and most of the CLIs use Claude, see https://www.tbench.ai/leaderboard.
jonfw•2h ago
Gemini is better at helping to debug difficult problems that require following multiple function calls.

I think Claude is much more predictable and follows instructions better- the todo list it manages seems very helpful in this respect.

nicce•2h ago
If you could control the model with system command, it would be very good. But at last I have failed miserably. Model is too verbose and helpful.
divan•2h ago
In my recent tests I found it quite smart at analyzing bigger picture (i.e. "hey, test failing not because of that, but because of whole assumption has changed and let me rewrite this test from scratch". But it also got stuck few times "I can't edit file, I'm stuck, let me try completely differently". But the biggest difference so far is the communication style - it's a bit.. snarky? I.e. comments like "yeah, tests are failing - as I suspected". Why the f it suspected failing test on the project it sees for the first time? :D
CuriouslyC•1h ago
Gemini is amazing for taking a merge file of your whole repo, dropping it in there, and chatting about stuff. The level of whole codebase understanding is unreal, and it can do some amazing architectural planning assistance. Claude is nowhere near able to do that.

My tactic is to work with Gemini to build a dense summary of the project and create a high level plan of action, then take that to gpt5 and have it try to improve the plan, and convert it to a hyper detailed workflow xml document laying out all the steps to implement the plan, which I then hand to claude.

This avoids pretty much all of Claude's unplanned bumbling.

filchermcurr•1h ago
The Gemini CLI tool is atrocious. It might work sometimes for analyzing code, but for modifying files, never. The inevitable conclusion of every session I've ever tried has been an infinite loop. Sometimes it's an infinite loop of self-deprecation, sometimes just repeating itself to failure, usually repeating the same tool failure until it catches it as an infinite loop. Tool usage frequently (we're talking 90% of the time) fails. It's also, frankly, just a bummer to talk to. The "personality" is depressed, self-deprecating, and just overall really weird.

That's been my experience, anyway. Maybe it hates me? I sure hate it.

siva7•3h ago
It's more interesting to compare what gemini cli and codex cli did wrong? (though i haven't used both of them for weeks to months)
syntaxing•2h ago
I don’t know if I’m doing something wrong. I was using Sonnet 4 with GitHub Copilot. Recently a week ago switched to Claude Code. I find GitHub Copilot solves problem and bugs way better than Claude Code. For some reason, Claude Code seems very lazy. Has anyone experience something similar?
cosmic_cheese•2h ago
I haven’t tried other LLMs but have a fair amount of experience with Claude Code, and there definitely times when you have to be explicit about the route you want it to take and tell it to not take shortcuts.

It’s not consistent, though. I haven’t figured out what they are but it feels like there are circumstances where it’s more prone to doing ugly hacky things.

StephenAshmore•2h ago
It may be a configuration thing. I've found quite the opposite. Github Copilot using Sonnet 4 will not manage context very well, quite frequently resorting to running terminal commands to search for code even when I gave it the exact file it's looking for in the copilot context. Claude code, for me, is usually much smarter when it comes to reading code and then applying changes across a lot of files. I also have it integrated into the IDE so it can make visual changes in the editor similar to GitHub Copilot.
syntaxing•2h ago
I do agree with you, Github Copilot uses more tokens like you mentioned with redundant searches. But at the end of the day, it solves the problem. Not sure if the cost out weights the benefit though compared to Claude Claude. Going to try Claude Code more and see if I'm prompting it incorrectly.
libraryofbabel•2h ago
The consensus is the opposite: most people find copilot does less well than Claude with both using sonnet 4. Without discounting your experience, you’ll need to give us more detail about what exactly you were trying to do (what problem, what prompt) and what you mean by “lazy” if you want any meaningful advice though.
sojournerc•31m ago
Where do you find this "consensus"?
wordofx•2h ago
I have most of the tools setup so I can switch between them and test which is better. So far Amp and Claude Code are on top. GH Copilot is the worst. I know MS is desperately trying to copy its competitors but the reality is, they are just copying features. They haven’t solved the system prompts. So the outcomes are just inferior.
diego_sandoval•2h ago
It shocks me when people say that LLMs don't make them more productive, because my experience has been the complete opposite, especially with Claude Code.

Either I'm worse than then at programming, to the point that I find an LLM useful and they don't, or they don't know how to use LLMs for coding.

dsiegel2275•2h ago
Agreed. I only started using Claude Code about a week and a half ago and I'm blown away by how productive I can be with it.
pawelduda•2h ago
I've had occasions where a relatively short prompt solved me an entire day of debugging and fixing things, because it was tech stack I barely knew. Most impressive part was when CC knew the changes may take some time to be applied and just used `sleep 60; check logs;` 2-3 times and then started checking elsewhere if something's stuck. It was, CC cleaned it up and a minute later someone pinged me that the it works.
ta12653421•2h ago
Productivity boost is unbelieveable! If you handle it right, its a boon - its like having 3 junior devs at hand. And I'm talking about using the web interface.

I guess most people are not paying and cant therefore apply the project-space (which is one of the best features), which unleashes its full magic.

Even if I'm currently without a job, I'm still paying because it helps me.

ta12653421•2h ago
LOL why do I get downvoted for explaining my experience? :-D
pawelduda•14m ago
Because you posted a success story about LLM usage on HN
tjr•2h ago
What do you work on, and what do LLMs do that helps?

(Not disagreeing, but most of these comments -- on both sides -- are pretty vague.)

SXX•2h ago
For once LLMs are good for building game prototypes. When all you care is to check whatever something is fun to play it really doesn'a matter how much of tech debt you generate in process.

And you start from the stratch all the time so you can generate all the documentation before you ever start to generate code. And when LLM slop become overwhelming you just drop it and go to check next idea.

jsight•2h ago
What is performance like for you? I've been shocked at how many simple requests turn into >10 minutes of waiting.

If people are getting faster responses than this regularly, it could account for a large amount of the difference in experiences.

totalhack•2h ago
Agree with this, though I've mostly been using Gemini CLI. Some of the simplest things, like applying a small diff, take many minutes as it loses track of the current file state and takes minutes to figure it out or fail entirely.
wredcoll•2h ago
The best part about llm coding is that you feel productive even when you aren't, makes coding a lot more fun.
timr•2h ago
It depends very much on your use case, language popularity, experience coding, and the size of your project. If you work on a large, legacy code base in COBOL, it's going to be much harder than working on a toy greenfield application in React. If your prior knowledge writing code is minimal, the more amazing the results will seem, and vice-versa.

Despite the persistent memes here and elsewhere, it doesn't depend very much on the particular tool you use (with the exception of model choice), how you hold it, or your experience prompting (beyond a bare minimum of competence). People who jump into any conversation with "use tool X" or "you just don't understand how to prompt" are the noise floor of any conversation about AI-assisted coding. Folks might as well be talking about Santeria.

Even for projects that I initiate with LLM support, I find that the usefulness of the tool declines quickly as the codebase increases in size. The iron law of the context window rules everything.

Edit: one thing I'll add, which I only recently realized exists (perhaps stupidly) is that there is a population of people who are willing to prompt expensive LLMs dozens of times to get a single working output. This approach seems to me to be roughly equivalent to pulling the lever on a slot machine, or blindly copy-pasting from Stack Overflow, and is not what I am talking about. I am talking about the tradeoffs involved in using LLMs as an assistant for human-guided programming.

ivan_gammel•2h ago
Overall I would agree with you, but I start feeling that this „iron law“ isn’t as simple as that. After all, humans have limited „context window“ too — we don’t remember every small detail on a large project we have been working on for several years. Loose coupling and modularity helps us and can help LLM to make the size of the task manageable if you don’t ask it to rebuild the whole thing. It’s not the size that makes LLMs fail, but something else, probably the same things where we may fail.
timr•2h ago
Humans have a limited short-term memory. Humans do not literally forget everything they've ever learned after each Q&A cycle.

(Though now that I think of it, I might start interrupting people with “SUMMARIZING CONVERSATION HISTORY!” whenever they begin to bore me. Then I can change the subject.)

ivan_gammel•2h ago
LLMs do not „forget“ everything completely either. Probably all major tools by now consume information from some form of memory (system prompt, Claude.md, project files etc) before your prompt. Claude Code rewrites the Claude.md, ChatGPT may modify the chat memory if it finds it necessary etc.
timr•2h ago
Writing stuff in a file is not “memory” (particularly if I have to do it), and in any case, it consumes context. Overrun the context window, and the tool doesn’t know about what is lost.

There are various hacks these tools take to cram more crap into a fixed-size bucket, but it’s still fundamentally different than how a person thinks.

ivan_gammel•51m ago
> Writing stuff in a file is not “memory”

Do you understand yourself what you just said? File is a way to organize data in memory of a computer by definition. When you write instructions to LLM, they persistently modify your prompts making LLM „remember“ certain stuff like coding conventions or explanations of your architectural choices.

> particularly if I have to do it

You have to communicate with LLM about the code. You either do it persistently (must remember) or contextually (should know only in context of a current session). So word „particularly“ is out of place here. You choose one way or another instead of bring able to just tell that some information is important or unimportant long-term. This communication would happen with humans too. LLMs have different interface for it, more explicit (giving the perception of more effort, when it is in fact the same; and let’s not forget that LLM is able to decide itself on whether to remember something or not).

> and in any case, it consumes context

So what? Generalization is an effective way to compress information. Because of it persistent instructions consume only a tiny fraction of context, but they reduce the need for LLM to go into full analysis of your code.

> but it’s still fundamentally different than how a person thinks.

Again, so what? Nobody can keep in short-term memory the entire code base. It should not be the expectation to have this ability neither it should not be considered a major disadvantage not to have it. Yes, we use our „context windows“ differently in a thinking process. What matters is what information we pack there and what we make of it.

BeetleB•1h ago
Both true and irrelevant.

I've yet had the "forgets everything" to be a limiting factor. In fact, when using Aider, I aggressively ensure it forgets everything several times per session.

To me, it's a feature, not a drawback.

I've certainly had coworkers who I've had to tell "Look, will you forget about X? That use case, while it look similar, is actually quite different in assumptions, etc. Stop invoking your experiences there!"

SXX•2h ago
This heavily depends on what project and stack you working on. LLMs are amazing for building MVPs or self-contained micro-services on modern, popular and well-defined stacks. Every single dependency, legacy or proprietary library and every extra MCP make it less usable. It get's much worse if codebase itself is legacy unless you can literally upload documentation for each used API into context.

A lot of programmers work on maintaining huge monolith codebases, built on top of 10-years old tech using obscure proprietary dependencies. Usually they dont have most of the code to begin with and APIs are often not well documented.

cpursley•2h ago
I feel like I could have written this myself; I'm truly dumbfounded. Maybe I am just a crappy coder but I don't think I'd be getting such good results with Claude Code if I were.
socalgal2•2h ago
I’m trying to learn jj. Both Gemini and ChatGPT gave me incorrect instructions 4 of 5 times

https://jj-vcs.github.io/jj/

BeetleB•1h ago
That's because jj is relatively new, and constantly changing. The official tutorial is (by their own admission), out of date. People's blog posts are fairly different in what commands/usage they recommend, as well.

I know it, because I recently learned jj, with a lot of struggling.

If a human struggles learning it, I wouldn't expect LLMs to be much better.

exe34•2h ago
it makes me very productive with new prototypes in languages/frameworks that I'm not familiar with. conversely, a lot of my work involves coding as part of understanding the business problem in the first place. think making a plot to figure out how two things relate, and then based on the understanding trying out some other operation. it doesn't matter how fast the machine can write code, my slow meat brain is still the bottleneck. the coding is trivial.
Aurornis•2h ago
I’ve found LLMs useful at some specific tasks, but a complete waste of time at others.

If I only ever wrote small Python scripts, did small to medium JavaScript front end or full stack websites, or a number of other generic tasks where LLMs are well trained I’d probably have a different opinion.

Drop into one of my non-generic Rust codebases that does something complex and I could spent hours trying to keep the LLM moving in the right direction and away from all of the dead ends and thought loops.

It really depends on what you’re using them for.

That said, there are a lot of commenters who haven’t spent more than a few hours playing with LLMs and see every LLM misstep as confirmation of their preconceived ideas that they’re entirely useless.

lambda•2h ago
It can be more than one reason.

First of all, keep in mind that research has shown that people generally overestimate the productivity gains of LLM coding assistance. Even when using a coding assistant makes them less productive, they feel like they are more productive.

Second, yeah, experience matters, both with programming and LLM coding assistants. The better you are, the less helpful the coding assistant will be, it can take less work to just write what you want than convince an LLM to do it.

Third, some people are more sensitive to the kind of errors or style that LLMs tend to use. I frequently can't stand the output of LLMs, even if it technically works; it doesn't live to to my personal standards.

pton_xd•2h ago
> Third, some people are more sensitive to the kind of errors or style that LLMs tend to use. I frequently can't stand the output of LLMs, even if it technically works; it doesn't live to to my personal standards.

I've noticed the stronger my opinions are about how code should be written or structured, the less productive LLMs feel to me. Then I'm just fighting them at every step to do things "my way."

If I don't really have an opinion about what's going on, LLMs churning out hundreds of lines of mostly-working code is a huge boon. After all, I'd rather not spend the energy thinking through code I don't care about.

Uehreka•1h ago
> research has shown that people generally overestimate the productivity gains of LLM coding assistance.

I don’t think this research is fully baked. I don’t see a story in these results that aligns with my experience and makes me think “yeah, that actually is what I’m doing”. I get that at this point I’m supposed to go “the effect is so subtle that even I don’t notice it!” But experience tells me that’s not normally how this kind of thing works.

Perhaps we’re still figuring out how to describe the positive effects of these tools or what axes we should really be measuring on, but the idea that there’s some sort of placebo effect going on here doesn’t pass muster.

d-lisp•2h ago
Basic engineering skills (frontend development, python, even some kind of high level 3d programming) are covered. If you do C/C++, or even Java in a preexisting project then you will have a hard time constantly explaining the LLM why <previous answer> is absolute nonsense.

Everytime I tried LLMs, I had the feeling of talking with a ignorant trying to sound VERY CLEVER: terrible mistakes at every line, surrounded with punchlines, rocket emojis and tons of bullshit. (I'm partly kidding).

Maybe there are situations where LLMs are useful e.g. if you can properly delimit and isolate your problem; but when you have to write code that is meant to mess up with the internal of some piece of software then it doesn't do well.

It would be nice to know from each part of the "happy users" and "mecontent usere" of LLMs in what context they experimented with it to be more informed on this question.

AaronAPU•2h ago
If you’re working with a massive complicated C++ repository, you have to take the time to collect the right context and describe the problem precisely enough. Then you should actually read the code to verify it even makes sense. And at that point, if you’re a principle level developer, you could just as easily do it yourself.

But the situation is very different if you’re coding slop in the first place (front end stuff, small repo simple code). The LLMs can churn that slop out at a rapid clip.

OtherShrezzing•2h ago
I think it’s just that the base model is good at real world coding tasks - as opposed to the types of coding tasks in the common benchmarks.

If you use GitHub Copilot - which has its own system level prompts - you can hotswap between models, and Claude outperforms OpenAI’s and Google’s models by such a large margin that the others are functionally useless in comparison.

ec109685•2h ago
Anthropic has opportunities to optimize their models / prompts during reinforcement learning, so the advice from the article to stay close to what works in Claude code is valid and probably has more applicability for Anthropic models than applying the same techniques to others.

With a subscription plan, Anthropic is highly incentivized to be efficient in their loops beyond just making it a better experience for users.

sdsd•2h ago
Oof, this comes at a hard moment in my Claude Code usage. I'm trying to have it help me debug some Elastic issues on Security Onion but after a few minutes it spits out a zillion lines of obfuscated JS and says:

  Error: kill EPERM
      at process.kill (node:internal/process/per_thread:226:13)
      at Ba2 (file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19791)
      at file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19664
      at Array.forEach (<anonymous>)
      at file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19635
      at Array.forEach (<anonymous>)
      at Aa2 (file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19607)
      at file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:19538
      at ChildProcess.W (file:///usr/local/lib/node_modules/@anthropic-ai/claude-code/cli.js:506:20023)
      at ChildProcess.emit (node:events:519:28) {
    errno: -1,
    code: 'EPERM',
    syscall: 'kill'
  }
I'm guessing one of the scripts it runs kills Node.js processes, and that inadvertantly kills Claude as well. Or maybe it feels bad that it can't solve my problem and commits suicide.

In any case, I wish it would stay alive and help me lol.

sixtyj•2h ago
Jump to another LLM helps me to find what happened. *This is not a official advice :)
idontwantthis•2h ago
I have had zero good results with any LLM and elastic search. Everything it spits out is a hallucination because there aren’t very many examples of anything complete and in context on the internet.
triyambakam•1h ago
I would try upgrading or wiping away your current install and re-installing it. There might be some cached files somewhere that are in a bad state. At least that's what fixed it for me when I recently came across something similar.
yc-kraln•50m ago
I get this issue when it uses sudo to run a process with root privileges, and then times out.
gervwyk•2h ago
We’re considering building a coding agent for Lowdefy[1], a framework that lets you build web apps with YAML config.

For those who’ve built coding agents: do you think LLMs are better suited for generating structured config vs. raw code?

My theory is that agents producing valid YAML/JSON schemas could be more reliable than code generation. The output is constrained, easier to validate, and when it breaks, you can actually debug it.

I keep seeing people creating apps with vibe coder tools but then get stuck when they need to modify the generated code.

Curious if others think config-based approaches are more practical for AI-assisted development.

[1] https://github.com/lowdefy/lowdefy

ec109685•2h ago
I wouldn’t get hung up on one shotting anything. Output to a format that can be machine verified, ideally in a format there is plenty of industry examples for.

Then add a grader step to your agentic loop that is triggered after the files are modified. Give feedback to the model if there any errors and it will fix them.

amelius•2h ago
How do you specify callbacks?

Config files should be mature programming languages, not Yaml/Json files.

gervwyk•2h ago
Callback: Blocks (React components) can register events with action chains (a sequential list of async functions) that will be called when the event is triggered. So it is defined in the react component. This abstraction of blocks, events, actions, operations and requests are the only abstraction required in the schema to build fully functional web apps.

Might sound crazy but we built full web apps in just yaml.. Been doing this for about 5 years now and it helps us scale to build many web apps, fast, that are easy to maintain. We at Resonancy[1] have found many benefits in doing so. I should write more about this.

[1] - https://resonancy.io

hamandcheese•2h ago
> easier to validate

This is essential to productivity for humans and LLMs alike. The more reliable your edit/test loop, the better your results will be. It doesn't matter if it's compiling code, validating yaml, or anything else.

To your broader question. People have been trying to crack the low-code nut for ages. I don't think it's solvable. Either you make something overly restrictive, or you are inventing a very bad programming language which is doomed to fail because professional coders will never use it.

gervwyk•2h ago
Good point. i’m making the assumption that if the LLM has a more limited feature space to produce as output, then the output is more predictable, and thus faster to comprehend changes. Similar to when devs use popular libraries, there is a well known abstraction, therefore less “new” code to comprehend as i see familiar functions, making the code predictable to me.
myflash13•2h ago
CC is so damn good I want to use its agent loop in my agent loop. I'm planning to build a browser agent for some specialized tasks and I'm literally just bundling a docker image with Claude Code and a headless browser and the Playwright MCP server.
apwell23•2h ago
cool
HacklesRaised•2h ago
Delusional asshats trying to draft the grift?
the_mitsuhiko•2h ago
Unfortunately, Claude Code is not open source, but there are some tools to better figure out how it is working. If you are really interested in how it works, I strongly recommend looking at Claude Trace: https://github.com/badlogic/lemmy/tree/main/apps/claude-trac...

It dumps out a JSON file as well as a very nicely formatted HTML file that shows you every single tool and all the prompts that were used for a session.

CuriouslyC•1h ago
https://github.com/anthropics/claude-code

You can see the system prompts too.

It's all how the base model has been trained to break tasks into discrete steps and work through them patiently, with some robustness to failure cases.

the_mitsuhiko•1h ago
> https://github.com/anthropics/claude-code

That repository does not contain the code. It's just used for the issue tracker and some example hooks.

CuriouslyC•1h ago
It's a javascript app that gets installed on your local system...
the_mitsuhiko•1h ago
I'm aware of how it works since I have been spending a lot of time over the last two months working with Claude's internals. If you have spent some time with it, you know that it is a transpiled and minified mess that is annoyingly hard to detangle. I'm very happy that claude-trace (and claude-bridge [1]) exists because it makes it much easier to work with the internals of Claude than if you have to decompile it yourself.

[1]: https://github.com/badlogic/lemmy/tree/main/apps/claude-brid...

koakuma-chan•1h ago
https://github.com/dnakov/claude-code :trollface:
throwaway314155•31m ago
That's been DMCA'd since you posted it. Happen to know where I can find a fork?
koakuma-chan•4m ago
> That's been DMCA'd since you posted it.

I know, thus the :trollface:

> Happen to know where I can find a fork?

I don't know where you can find a fork, but even if there is a fork somewhere that's still alive, which is unlikely, it would be for a really old version of Claude Code. You would probably be better off reverse engineering the minified JavaScript or whatever that ships with the latest Claude Code.

athrowaway3z•2h ago
> "THIS IS IMPORTANT" is still State of the Art

Had a similar problems until I saw the advice "Dont say what it shouldn't but focus on what it should".

i.e. make sure when it reaches for the 'thing', it has the alternative in context.

Haven't had those problems since then.

amelius•2h ago
I mean, if advice like this worked, then why wouldn't Anthropic let the LLM say it, for instance?
sergiotapia•2h ago
Is Claude Code better than Amp?
radleta•2h ago
I’d be curious to know what MCPs you’ve found useful with CC. Thoughts?
on_the_train•2h ago
The lengths people will go through to avoid to code is astonishing
apwell23•2h ago
writing code is not the fun part of coding. I only realized that after using claude code.
monkaiju•1h ago
Hard disagree
yumraj•2h ago
I made insane progress with CC over last several weeks, but lately have noticed progress stalling.

I’m in the middle of some refactoring/bug fixing/optimization but it’s constantly running into issues, making half baked changes, not able to fix regressions etc. Still trying to figure out how to make do a better job. Might have to break it into smaller chunks or something. Been pretty frustrating couple of weeks.

If anyone has pointers, I’m all ears!!

imiric•1h ago
> If anyone has pointers, I’m all ears!!

Give programming a try, you might like it.

yumraj•26m ago
Yeah, have been doing that for 30 years.

Next…

1zael•2h ago
I've literally built the entire MVP of my startup on Claude Code and now have paying customers. I've got an existential worry that I'm going to have a SEV incident that will trigger a house of falling cards, but until then I'm constantly leveraging Claude for fixing security vulnerabilities, implementing test-driven-development, and planning out the software architecture in accordance with my long-term product roadmap. I hope this story becomes more and more common as time passes.
lajisam•2h ago
“Implementing test-driven development, and planning out software architecture in accordance with my long-term product roadmap” can you give some concrete examples of how CC helped you here?
foobarbecue•1h ago
> I hope this story becomes more and more common as time passes.

Why????????????

Why do you want devs to lose cognaizance of their own "work" to the point that they have "existential worry"?

Why are people like you trying to drown us all in slop? I bet you could replace your slop pile with a tenth of the lines of clean code, and chances are it'd be less work than you think.

Is it because you're lazy?

Mallowram•1h ago
second
BeetleB•1h ago
> I bet you could replace your slop pile with a tenth of the lines of clean code, and chances are it'd be less work than you think.

Actually, no. When LLMs produce good, working code, it also tends to be efficient (in terms of lines, etc).

May vary with language and domain, though.

stavros•1h ago
Eh, when is that, though? I'm always worrying about the bugs that I haven't noticed if I don't review the changes. The other day, I gave it a four-step algorithm to implement, and it skipped three of the steps because it didn't think they were necessary (they were).
BeetleB•1h ago
Hmm...

It may be the size of the changes you're asking for. I tend to micromanage it. I don't know your algorithm, but if it's complex enough, I may have done 4 separate prompts - one for each step.

foobarbecue•1h ago
Isn't it easier to just write the code???
BeetleB•20m ago
Depends on the algorithm. When you've been coding for a few decades, you really, really don't want to write yet another trivial algorithm you've written multiple tens of times in your life. There's no joy in it.

Let the LLM do the boring stuff, and focus on writing the fun stuff.

Also, setting up logging in Python is never fun.

stavros•1h ago
It was really simple, just traversing a list up and down twice. It just didn't see the reason why, so it skipped it all (the reason was to prevent race conditions).
imiric•1h ago
Well, don't be shy, share what CC helped you build.
orsorna•1h ago
You're speaking to a wall. For whatever reason, the type of people to espouse the wonders of their LLM workflow never reveal what kind of useful output they get from it, never mind substantiate their claims.
lifestyleguru•1h ago
duh, I ordered Claude Code to simply transfer money monthly to my bank account and it does.
ComputerGuru•1h ago
> but until then I'm constantly leveraging Claude for fixing security vulnerabilities

That it authored in the first place?

dpe82•1h ago
Do you ever fix your own bugs?
ComputerGuru•1h ago
Bugs, yes. Security vulnerabilities? Rarely enough that it wouldn’t make my HN list. It’s not remotely hard to avoid the most common issues.
janice1999•1h ago
Humans have the capacity to learn from their own mistakes without redoing a lifetime of education.
conception•2h ago
Ive seen context forge has a way to use hooks to keep CC going after context condensing. Are there any other patterns or tools people are using with CC to keep it on task, with current context until it has a validated completion of its task? I feel like we have all these tools separately but nothing brings it all together and also isn’t crazy buggy.
kroaton•1h ago
Load up the context with your information + task list (broken down into phases). Have Sonnet implement phase one tasks and mark phase 1 as done. Go into planning mode, have Opus review the work (you should ideally also review it at this point). Double press escape and go back to the point in the conversation where you loaded up the context with your information + task list. Tell it to do phase 2. Repeat until you run out of usage.
kroaton•1h ago
From time to time, go into Opus planning mode, have it review your entire codebase and tell it to go file by file and look for bugs, security issues, logical problems, etc. Have it make a list. Then load up the context + task list...
conception•11m ago
Yes, i can manage CC through a task list but there’s nothing technically stopping all your steps from happening automatically. That tool just doesn’t exist yet as far as I can tell but it’s not a very advanced tool to build. I’m surprised no one has put those steps together.

Also if the task runs out of context it will get progressively worse rather than refresh its own context from time to time.

rolls-reus•1h ago
What’s context forge?
conception•14m ago
https://github.com/webdevtodayjason/context-forge
whoknowsidont•2h ago
It's not that good, most developers are just really that subpar lol.
roflyear•1h ago
Claude Code is hilarious because often it'll say stuff that's basically "that's too hard, here's a bandaid fix" and implement it lol
ahmedhawas123•1h ago
Thanks for sharing this. At a time where this is a rush towards multi-agent systems, this is helpful to see how an LLM-first organization is going after it. Lots of the design aspects here are things I experiment with day to day so it's good to see others use it as well

A few takeaways for me from this (1) Long prompts are good - and don't forget basic things like explaining in the prompt what the tool is, how to help the user, etc (2) Tool calling is basic af; you need more context (when to use, when not to use, etc) (3) Using messages as the state of the memory for the system is OK; i've thought about fancy ways (e.g., persisting dataframes, parsing variables between steps, etc, but seems like as context windows grow, messages should be ok)

nuwandavek•3m ago
(author of the blogpost here) Yeah, you can extract a LOT of performance from the basics and don't have to do any complicated setup for ~99% of use cases. Keep the loop simple, have clear tools (it is ok if tools overlap in function). Clarity and simplicity >>> everything else.
marmalade2413•1h ago
I would be remis if after reading this I didn't point people towards talk box ( https://github.com/rich-iannone/talk-box) from one of the creators of great tables.