frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Knowledge-Creating LLMs

https://tecunningham.github.io/posts/2026-01-29-knowledge-creating-llms.html
1•salkahfi•38s ago•0 comments

Maple Mono: Smooth your coding flow

https://font.subf.dev/en/
1•signa11•7m ago•0 comments

Sid Meier's System for Real-Time Music Composition and Synthesis

https://patents.google.com/patent/US5496962A/en
1•GaryBluto•15m ago•1 comments

Show HN: Slop News – HN front page now, but it's all slop

https://dosaygo-studio.github.io/hn-front-page-2035/slop-news
3•keepamovin•16m ago•1 comments

Show HN: Empusa – Visual debugger to catch and resume AI agent retry loops

https://github.com/justin55afdfdsf5ds45f4ds5f45ds4/EmpusaAI
1•justinlord•18m ago•0 comments

Show HN: Bitcoin wallet on NXP SE050 secure element, Tor-only open source

https://github.com/0xdeadbeefnetwork/sigil-web
2•sickthecat•20m ago•1 comments

White House Explores Opening Antitrust Probe on Homebuilders

https://www.bloomberg.com/news/articles/2026-02-06/white-house-explores-opening-antitrust-probe-i...
1•petethomas•21m ago•0 comments

Show HN: MindDraft – AI task app with smart actions and auto expense tracking

https://minddraft.ai
2•imthepk•26m ago•0 comments

How do you estimate AI app development costs accurately?

1•insights123•27m ago•0 comments

Going Through Snowden Documents, Part 5

https://libroot.org/posts/going-through-snowden-documents-part-5/
1•goto1•27m ago•0 comments

Show HN: MCP Server for TradeStation

https://github.com/theelderwand/tradestation-mcp
1•theelderwand•30m ago•0 comments

Canada unveils auto industry plan in latest pivot away from US

https://www.bbc.com/news/articles/cvgd2j80klmo
2•breve•31m ago•1 comments

The essential Reinhold Niebuhr: selected essays and addresses

https://archive.org/details/essentialreinhol0000nieb
1•baxtr•33m ago•0 comments

Rentahuman.ai Turns Humans into On-Demand Labor for AI Agents

https://www.forbes.com/sites/ronschmelzer/2026/02/05/when-ai-agents-start-hiring-humans-rentahuma...
1•tempodox•35m ago•0 comments

StovexGlobal – Compliance Gaps to Note

1•ReviewShield•38m ago•1 comments

Show HN: Afelyon – Turns Jira tickets into production-ready PRs (multi-repo)

https://afelyon.com/
1•AbduNebu•39m ago•0 comments

Trump says America should move on from Epstein – it may not be that easy

https://www.bbc.com/news/articles/cy4gj71z0m0o
6•tempodox•40m ago•2 comments

Tiny Clippy – A native Office Assistant built in Rust and egui

https://github.com/salva-imm/tiny-clippy
1•salvadorda656•44m ago•0 comments

LegalArgumentException: From Courtrooms to Clojure – Sen [video]

https://www.youtube.com/watch?v=cmMQbsOTX-o
1•adityaathalye•47m ago•0 comments

US moves to deport 5-year-old detained in Minnesota

https://www.reuters.com/legal/government/us-moves-deport-5-year-old-detained-minnesota-2026-02-06/
8•petethomas•50m ago•3 comments

If you lose your passport in Austria, head for McDonald's Golden Arches

https://www.cbsnews.com/news/us-embassy-mcdonalds-restaurants-austria-hotline-americans-consular-...
1•thunderbong•55m ago•0 comments

Show HN: Mermaid Formatter – CLI and library to auto-format Mermaid diagrams

https://github.com/chenyanchen/mermaid-formatter
1•astm•1h ago•0 comments

RFCs vs. READMEs: The Evolution of Protocols

https://h3manth.com/scribe/rfcs-vs-readmes/
3•init0•1h ago•1 comments

Kanchipuram Saris and Thinking Machines

https://altermag.com/articles/kanchipuram-saris-and-thinking-machines
1•trojanalert•1h ago•0 comments

Chinese chemical supplier causes global baby formula recall

https://www.reuters.com/business/healthcare-pharmaceuticals/nestle-widens-french-infant-formula-r...
2•fkdk•1h ago•0 comments

I've used AI to write 100% of my code for a year as an engineer

https://old.reddit.com/r/ClaudeCode/comments/1qxvobt/ive_used_ai_to_write_100_of_my_code_for_1_ye...
2•ukuina•1h ago•1 comments

Looking for 4 Autistic Co-Founders for AI Startup (Equity-Based)

1•au-ai-aisl•1h ago•1 comments

AI-native capabilities, a new API Catalog, and updated plans and pricing

https://blog.postman.com/new-capabilities-march-2026/
1•thunderbong•1h ago•0 comments

What changed in tech from 2010 to 2020?

https://www.tedsanders.com/what-changed-in-tech-from-2010-to-2020/
3•endorphine•1h ago•0 comments

From Human Ergonomics to Agent Ergonomics

https://wesmckinney.com/blog/agent-ergonomics/
1•Anon84•1h ago•0 comments
Open in hackernews

Claude Code feels like magic because it is iterative

https://omarabid.com/claude-magic
105•todsacerdoti•7mo ago

Comments

sylware•7mo ago
Is claude able to write rv64 assembly code?

For instance can you asked for a vector-based quicksort? Well, with a "vector size unit" of a "standard" cache line, namely 512bits/64bytes (rv22+ profile).

Veen•7mo ago
You could just ask it:

https://claude.ai/public/artifacts/5f4cb680-9a99-4781-8803-9...

(No idea how good that is. I just gave it your comment)

tomashubelbauer•7mo ago
And if you use Claude Code you can also tell it to compile it and test it and it will keep fixing problems until it gets it right or gives up or spirals to a dead end.
sylware•7mo ago
You can give a set of test cases?
tomashubelbauer•7mo ago
With Claude Code you can, it is aware of your code base. Or you can have it generate them and then manually check them. If there are existing tests you can tell it to use those. I usually have it work on the thing I want it to do, keep a journal MarkDown file capturing what it did and why in case I want to review something later and tell it to build and test its changes after each edit.
rahoulb•7mo ago
My favourite way to use it is to write tests first, then say "make these pass". It will generate some code, run the tests, say "oh, there's an error here ... let's fix that ... oh, there's an error there, let's fix that ..." and (most of the time) it will reach a solution where the tests pass.

I already do TDD a lot of the time, and this way I can be sure that the actual requirements are covered by the tests. Whereas asking it to add tests to existing code often gets over-elaborate in areas that aren't important and misses cases for the vital stuff.

Sometimes, when asking Claude to pass existing tests, it comes up with better implementations than I would have done. Other times the implementation is awful, but I know I can use it, because the tests prove it works. And then I (or Claude) can refactor with confidence later.

sylware•7mo ago
I use noscript/basic (x)html browsers: I get only a 'enable javascript' thingy. Is there a clean web portal for this? Or could you pastebin that stuff on a decently implemented online service like https://paste.c-net.org ? Thx!
mavhc•7mo ago
https://github.com/simonw/llm
sylware•7mo ago
AMAZING! YT-DLP FOR AIS!

Well, I am going to test that very soon. The thing which may be a blocker: if it requires the credentials of an already created account, creation gated behind one of the web engines from the whatng cartel... back to square one.

marliechiller•7mo ago
I find the use of the word intelligence to be a bit of a misnomer. Is something intelligent if all its doing is pattern matching? Is the evolution that led to owl butterflies appearing like an owl intelligent? Im not sure.

As an aside, its amusing we simulatenously have this article on the front page as well as [Generative AI coding tools and agents do not work for me](https://news.ycombinator.com/item?id=44294633) also on the front page. LLMs are really dividing the community at the moment and its exhausting to keep up with what I (as a dev) should be doing to stay sharp

0x416c6578•7mo ago
| its exhausting to keep up with what I (as a dev) should be doing to stay sharp

That for me is the biggest thing I am feeling about LLMs at the moment, things are moving so quickly but to what end? I know this industry is constantly evolving and in some ways that is very exciting but I also feel like it is this exponential runaway that requires very deliberate attention focused on the bleeding edge to stay relevant, when a lot of my time in my day job doesn't facilitate this (which I have identified and have made the effort and will be changing company in a month).

My own two cents on LLMs (as a junior / low mid level early career software engineer) is that they work best as a better version of Google for any well explored issue, and being able to talk through problems in a conversational manner has been a game changer. But I do fear sometimes that I am not gaining the same amount of knowledge as I would before LLMs became mainstream, it's a shortcut that in the long run I fear is going to reduce the average problem solving ability and original / novel thinking ability of software engineers (whether that is even a requirement in most SWE jobs is up for debate).

ozim•7mo ago
I think this staying sharp is FOMO instilled by influencers and people selling guides/courses. Most of the stuff will be implemented by Anthropic, OpenAI etc.

You can run local models but it is like playing matchbox cars in your backyard and imagining you will be F1 driver some day.

Big guys have APIs you pay for to do serious work that’s all you need to know.

conartist6•7mo ago
Running models at all is playing with matchbox cars. If you want to play in the big leagues, you have to become the model.
jackstraw42•7mo ago
A bit unfair to call local models Matchbox cars compared to F1. There are plenty of uses for LLMs locally that don't require the largest models, it's not like it has to be all-or-nothing. For example, as a general browser assistant to help summarize articles, explain context, etc. the gemma-3-4B model does very well and is lightning fast on my old 3060 Ti.
ozim•7mo ago
You just wrote exact confirmation. Running 3060 Ti gemma-3-4B to play with as your local assistant is toying around.

Make the same as a startup or a company and you most likely will be out of business in 3 to 6 months because big guys will have everything faster and better in no time. GPT-o3 price drop of 80% most likely made running 3060Ti more expensive if you check your energy bill.

jackstraw42•7mo ago
> Make the same as a startup or a company

Not looking to do that though! You can call it toying around if you want, but I think you're really limiting your perspective by dismissing smaller models.

vaylian•7mo ago
> its exhausting to keep up with what I (as a dev) should be doing to stay sharp

I have observed the JavaScript ecosystem producing one new framework after another. I decided to wait for the dust to settle. Turns out vanilla.js is still fine for the things I need to do.

csomar•7mo ago
> I find the use of the word intelligence to be a bit of a misnomer. Is something intelligent if all its doing is pattern matching? Is the evolution that led to owl butterflies appearing like an owl intelligent? Im not sure.

Is a random number generator intelligent? I don't think people perceive or understand intelligence equally. I don't think we have an answer to what exactly is intelligence or how to create it.

> LLMs are really dividing the community at the moment and its exhausting to keep up with what I (as a dev) should be doing to stay sharp

You could try at your comfortable pace. I only started using agents very recently. The dangerous thing is to go to extremes (all in on AI or completely refusing the tech)

viraptor•7mo ago
Keep in mind that as usual, mostly the extreme views are getting posted. The urge to both write and click on "sometimes I find LLMs useful for partial solutions in the right context" is low compared to "AI will replace all developers in 2 years". It may not be as dividing as we read here. It certainly isn't, when looking at what my co-workers do. You can chill and learn it like any other new tech. (Without following every detail day to day)
jorvi•7mo ago
I will die on the hill that for the foreseeable future, LLMs inside an IDE are just fancy autocomplete.

In a more general interface they're also nice for getting a birds-eye view on a topic you're unfamiliar with.

However, just as a counterexample of how dumb they really are: I asked both Gemini 2.5 Pro and Opus 4 if there were any extra settings for VSCode's UI density and without hesitation both of them made up a bunch of 'window.density' settings.

If they can't even get something so extremely basic and well-documented right, how are you going to trust them with giving you flawless C or Typescript?

rolisz•7mo ago
I trust them more with Typescript because there's a compiler that gives them feedback and that has been used for training LLMs.
ojosilva•7mo ago
Well the article briefly addresses this: it's about the iteration: given a problem and sufficient processing power we can attain an intelligent,correct answer by quickly iterating from prompt to results.

There's also a measurement vector for zero-shot LLM responses. But excelling at zero-shot is not a requirement for making LLMs useful.

The market is pointing the way, agents increase iteration capabilities, increasing usefulness. Reasoning models/architectures are another example where iterations make advances - the LLM iterates "in-band" and self-evals so that there's a better chance of a correct outcome.

All that in a mere 3.5 years since launch. To call it an autocomplete is very short sighted. Even if we reached LLMs ceiling, the choice of AI-oriented workflows (TTS, TDD, YOLO...), tooling, protocols and additional architecture adjustments (gigantic context windows, instant adaptors, speed, etc) will make up for any lack of precision the same way we work around human flaws to help us succeed in most tasks.

jorvi•7mo ago
> will make up for any lack of precision the same way we work around human flaws to help us succeed in most tasks.

A human won't flip-flop from "You're right! That doesn't exist" and then straight back to "You're right, that does exist!" based on how a question is asked.

People really hold LLMs their capabilities in way, waaaay too high esteem. You have to walk a tightrope with them, and you always will unless they can fix the hallucination problem, which is quite unlikely due to how LLMs work.

throw234234234•7mo ago
Both are true. What the parent poster is saying is that the models don't have to be perfect and can "hallucinate" because software is a unique domain in that:

* Validation: You can validate against objective signals either you or your tooling define (e.g. unit tests, compile errors, etc).

* Cost of Failure is Low: You can undo bad work, and feed errors back into the model as a signal to reduce future errors. Its not like physical domains (e.g. building a house, bridge, etc) where "undoing" is expensive and wasteful.

The models just need to be "good enough" that with enough tries the error accumulated over long jobs doesn't grow -> by adding data back into the model giving it feedback at each step you can curb this. How these agent tools sometimes achieve that is that they integrate with your build tooling stack, your IDE, your unit tests, etc etc -> they have a lot of "guard rails" to effectively curve the risk of hallucinations and/or rather when there is one to bring things back in line because they are long running processes.

TL;DR if you can't reduce the risk of bad model outputs you can mitigate the impact of the risk through retries and guard rails that feed back into the model. That's what these tools do to reduce the error rate. People are complaining about the risk of bad model output without looking at the other side - is there a way to make the effective consequence of that minimal and correct course? That's what these agent tools try to do - they want to work "like a human".

Don't get me wrong; there's a lot of signals that are hard to capture related to taste (e.g. I add a feature, it subtly changes the design of another X features already done) - and I personally find it easier to fine tune/go manual after a certain point but YMMV.

hammyhavoc•7mo ago
You can feed errors back in, but on anything of meaningful scale, context windows are going to be the largest bottleneck.

Context window size limits aside, Claude Code seems to atrophy or misinterpret existing context even prior to compactions very frequently. The more tokens within the context windows the shittier and shittier it performs—not that it's great to begin with.

throw234234234•7mo ago
That there is the issue. Right now they may not be good enough and create issues - but they are still a lot more useful than they were. Any improvements will feed straight into the tooling without much effort.

Don't get me wrong; I would love to be wrong. But I do think models will get better. There's just too much money thrown at the problem and SWE seems to be the target especially I think for Anthrophic where thats their main market/use case. They don't seem to have the same diversified user base.

If I wasn't a SWE I would think there's no need to become one tbh. Just need to wait a little bit more.

hombre_fatal•7mo ago
"Fancy autocomplete" doesn't seem very scathing when the autocomplete you're talking about is "give me feedback and find bugs in this 3000 line file: <paste>".

Ah yes, it's just using pattern recognition from its training data to generalize abstract concepts about software so that it can apply them to my specific, complex file to find a bug that eluded multiple software engineers for a month.

It's a stochastic parrot!

bluefirebrand•7mo ago
> LLMs inside an IDE are just fancy autocomplete.

Not only are they just fancy autocomplete, they are so intrusive that it increases friction for me instead of lowering it

Having to pause a second to decide if I want to tab complete a line is one thing

Having to pause a minute to evaluate an entire suggested function is jarring and completely wrecks my momentum, especially if I wind up rejecting the suggestion

And especially because if I reject the suggestion, the damn thing keeps re-suggesting stuff while I'm typing the rest out

Constant interrupting and irritating

gtani•7mo ago
(IMO)The 2 sigma's are getting receptive audiences as pre-announces of layoffs at Amazon, MS and others yield the analog of a 50 VIX in the options markets, where enumerating mid/good/bad scenarios goes from difficult to impossible.
talles•7mo ago
When talking about AI, intelligence is meaningless if you don't defined it beforehand. The common sense meaning of intelligence fails on this kind of discussion.
monista•7mo ago
Would it surprise you seeing eg. seeing on the front page articles about both Nobel prize winners and Darwin award winners? What is intelligence, after all? We expect AI to be as smart as Einstein or Terens Tao, but so far, we see that LLMs are pretty good at behaving just like humans, that is, most times stupid.
j_crick•7mo ago
> Is something intelligent if all its doing is pattern matching?

Aren’t we humans doing just that either? If yes, then what?

marliechiller•7mo ago
Personally, I dont think so. I can understand a mathmatical axiom and reason with it. In a sequence of numbers I will be able to tell you N + 1, regardless of where N appears in the sequence. An LLM does not "know" this in the way a human does. It just applies whatever is the most likely thing that the training data suggests.
j_crick•7mo ago
But technically you can do that only because you recognize the pattern, because the pattern (sequence) is there and you were taught that it’s a pattern and how to recognize it. Publicly available LLMs of now are taught different patterns, and are also constrained by how they are made.

Maybe there’s something for LLMs in reflection and self-reference that has to be “taught” to them (or has to be not blocked from them if it’s already achieved somehow), and once it becomes a thing they will be “cognizant” in the way humans feel about their own cognition. Or maybe the technology, the way we wire LLMs now simply doesn’t allow that. Who knows.

Of course humans are wired differently, but the point I’m trying to make is that it’s pattern recognition all the way down both for humans and LLMs and whatnot.

Uehreka•7mo ago
I keep seeing the same “middlebrow dismissals” of LLMs in HN comments, it’s getting pretty repetitive to have to cover all of this over and over, but here goes (I recognize GP is only saying one of these, I’m just trying to preempt the others).

- “LLMs don’t have real intelligence” - We as a society don’t have a rigorous+falsifiable consensus on what “intelligence” is to begin with. Also many things that we all agree are not intelligent (cars, CPUs, egg timers, etc.) are still useful.

- “But people are claiming they’re intelligent and that they’re AGI” - OK, well what if those people are wrong but LLMs are still useful for many things? Not all LLM users are AGI believers, many aren’t.

- “But people are forcing me to use them.” - They shouldn’t do that, that’s bad. It doesn’t mean LLMs are bad.

- “They’re just pattern-matchers, stochastic parrots, they can’t generalize outside their training data.” - All the academic arguments I’ve seen about this become irrelevant when I ask an LLM to write me code in a really esoteric programming language and it succeeds. I personally don’t think this is true, but if in fact they are categorically no more than pattern-matchers, then Pattern Matching Is All You Need to do many many jobs.

- “I have an argument why they are categorically useless for all tasks” - the existence of smart people using these things of their own accord, observing the results and continuing to use them should put a serious dent in this theory.

- “They can’t do my whole job” - OK, what if they can help you with part of your job?

- “I’m a programmer. If I use an AI Assistant, but still have to review its code, I haven’t saved any time.” - This can’t be categorically disproven, but also isn’t totally true, and in the gaps in this argument lie amazing things if you’re willing to keep an open mind.

- “They can’t do arithmetic, how can they be expected to do everyday tasks.” - I’ll admit that it’s weird that LLMs are useful despite failing at arithmetic, but they are. Rain Man had trouble with everyday tasks, how could he be expected to do arithmetic? The world is counterintuitive sometimes.

- “They can’t help me with any of my job, I do surgery all day” - Thank you and my condolences. Please be aware though that many jobs out there aren’t surgery.

- “The people who promote them are annoying. I call them ‘influencers’ to signal that they are not hackers like us.” - Many good things have annoying fans, if you follow this logic to its conclusion you will miss out on many good things.

- “I’ve tried them, I’ve tried them in a variety of ways, they’re just really not for me.” - That’s fine. I’d still recommend checking in on the field later on, but I can totally admit that these things can take some finagling to get right, and not everyone has time. They will get easier to use in the future.

- “No they won’t, we’ve hit a plateau! Attention isn’t all you need!” - If all LLM development were to stop today, all AI cloud services shut down and only the open weights LLMs were left, I predict we’d still be finding novel usage patterns for them for the next 3-5 years.

kypro•7mo ago
Kinda reminds me of something my old AI professor used to say, "every problem is a search problem".

Intelligence is really just a measure of ones's ability to accurately filter and iterate over the search space.

Evolution is one extreme where the heuristic is poor so it must do a huge amount of iteration over many bad solution to find reasonably good solutions. Then on the other hand you have expert systems which are great at refining the search space to always deliver quality answers, but filter too much and are therefore too narrow so lack the creativity and nuance of real intelligence.

LLMs provide good heuristics and agents with verifiable goals allow for iteration. This combination results in a system which demonstrates significantly more intelligent than either of its parts.

conartist6•7mo ago
Search is eventually an existential and philosophical problem. How do you know if you have found what you are searching for? How do you know how long you can afford to keep searching if you don't find it? An LLM lacks even the intelligence of a cat or a mouse if you stop treating the intelligence of the human using it as its intelligence.

To that I add this:

Every single LLM user is a hyperintelligent ultraproductive centaur if I understand correctly, so how is it possible that I, as a made-of-meat individual, am kicking the ass of several whole world-class teams of these LLM-using centaur-y juggernauts? It shouldn't be possible, right?

But I'm human, so it is

ed_mercer•7mo ago
> What if Claude Code operated autonomously with massive parallel compute?

Afaik this is not possible as LLMs have linear conversations.

weiliddat•7mo ago
I guess if we interpreted it charitably, maybe every time there's a decision to be made, it just forked itself and ran with possible inputs it expects?

I would say that's how some devs operate too. Instead of waiting for the product/customer to come back, let's predict how they might think and make a couple of possible solutions and iterate over them. Some might be dead ends, we can effectively prune them, some might lead to more forks, some might lead down linear paths. But we can essentially get more coverage before really needing some input.

We might argue that it already does that in its chain-of-thought, or agent mode, but having a dedicated "forked" checkpoint lets us humans then check and rewind time in that sense.

dbbk•7mo ago
Of course it's possible, you can do this today... just create 100 GitHub PRs and assign Copilot Coding Agent
lokimedes•7mo ago
Well, Intelligence is arguably represented in a “prior” that skews the result to an optimum faster, with fewer iterations. What the article is describing as intelligence is exactly the opposite, it’s just brute force.
talles•7mo ago
Technology feels like magic when you don't understand it.
amelius•7mo ago
Even more so when even the creators of the technology don't understand it.
revskill•7mo ago
Where did u master humor from ?
mmh0000•7mo ago
The creators understand it well. The math is pretty a lot, but, you can literally do it with pen and paper. There are plenty of blog[1] posts showing the process.

Anyone claiming AI is a black box no one understands is a marketing-level drone trying to sell something that THEY don't understand.

[1] https://explainextended.com/2023/12/31/happy-new-year-15/

amelius•7mo ago
No, they only understand it on a superficial level. The behavior of these systems emerges from simpler stuff, yes, but the end result is difficult to reason about. Just have a look at Claude's prompt [1] that leaked some time ago, and which is an almost desperate attempt of the creators to nudge the system into a certain direction and make it not say the wrong things.

We probably need a New Kind of Soft Science™ to fill this gap.

[1] https://simonwillison.net/2025/May/25/claude-4-system-prompt...

revskill•7mo ago
LLM reflects YOUR intelligence, it's the secret truth.
rvnx•7mo ago
Many of the complainers don't know how to use them and how to write prompts, and then blame the LLMs.

Or simply use LLMs that struggle at writing good code (GPT, Gemini Pro, etc).

You need to be in the shoes of a product owner, and be able to express your requirements clearly and drive the LLM in your direction, and this requires to learn new skills (like kids learn how to use search engines).

timr•7mo ago
> Or simply use LLMs that struggle at writing good code (GPT, Gemini Pro, etc).

I love how one side of this debate seems to have embraced "No True Scotsman" as the preferred argument strategy. Anyone who points out that these things have practical limitations gets a litany of "oh you aren't using it right" or "oh, you just aren't using the cool model" in response. It reminds me of the hipsters in SF who always felt your music was a little too last week.

As someone who is currently using these every day, Gemini Pro is right up there with the very best models for writing code -- and "GPT" is not a single thing -- so I have no idea what you're talking about. These things have practical limitations.

rvnx•7mo ago
<removed dismissive answer />
timr•7mo ago
Your impression of hipsters is certainly dead-on.
tpmoney•7mo ago
I don’t think anyone* doesn’t think these things don’t have practical limitations. It’s more that a lot of the negative stories and results seem to have a “I asked a random stranger on the internet to balance my check book and got money stolen” vibe to them that very much do fall into a “you’re using it wrong” category**. They absolutely have limitations, they absolutely get things wrong and need a real human involved still. They’re also a lot more capable when used in the right contexts and with the right tooling than some people are giving them credit for. It’s a new tool and like most new and exciting tools, it’s going to be shoved everywhere, it will take time to learn to use effectively, and everyone and their dog is going to try building “Thing but with AI”, just like when the internet was new and exciting everything was “Thing but online”. And some ideas will turn out to be terrible, some will turn out to be too early, but viable as tech improves, and some (like AI code completion, and some of the better agent based tools) will prove to be useful now and get better as the tech gets better and users get more skilled.

* Yes there are some truly unhinged boosters and CEOs out there that think they’re going to replace their entire support staff with chat models tomorrow. They’re wrong and the best thing we can do to discourage that is make it painful and hold them to the promises their AI makes, even if it means they need to give away a year of free flights because the chat bot said they would.

** corollary to the above note, plenty of the negative articles / experiences are also just exploring the real limitations and reporting on them. But the internet is the internet and as always it’s the extremes that get amplified and make for the biggest clicks. It is frustrating that even in spaces like HN where you might expect more nuanced discussion a lot of the discussion seems to fall into the extreme booster/detractor headlines and re-hashes of those positions (and strawmen of both) rather than interesting explorations of the extents and limits. I suppose you just can’t really have a good nuanced discussion with you and a thousand of your closest friends.

WorldMaker•7mo ago
> Or simply use LLMs that struggle at writing good code (GPT

As still the default for GitHub Copilot GPT doesn't seem to "struggle" at all with writing good code. Anecdotally, in comparison with GPT, Claude seems woefully under-trained in areas such as PowerShell and cross-platform solutions compared to GPT. (Which also seems to show directly in Claude Code's awful cross-platform support. If Claude is so good why doesn't it fix Claude Code's Windows support? Add more PowerShell support instead of just bashing out bash-isms?)

A lot of impressions of the LLMs are hugely subjective, and I'm inclined to the above poster's suggestion a lot of of what you get out of an LLM is a reflection of who you are and what you put in to the LLM. (They are massively optimized GIGO machines after all.)

thi2•7mo ago
Would you mind sharing good and bad examples of prompts? I always read comments like yours and miss examples.
goodpoint•7mo ago
If anything it reflects the intelligence of the people whose work is being stolen.
rvnx•7mo ago
Ycombinator is an accomplice of this, and you know, all they will get is billions of tainted money as punishment. But I guess they can live with that.
cainxinth•7mo ago
Just like all the people who think their LLM is sentient or an alien or a god are really just talking to themselves.
GardenLetter27•7mo ago
This feels a bit too optimistic, in practice it often gets stuck going down a rabbit hole (and burning up your requests / tokens doing it!).

Like even when I tested it on a clean assessment (albeit with Cursor in this case) - https://jamesmcm.github.io/blog/claude-data-engineer/ - it did very well in agent mode, but the questions it got wrong were worrying because they're the sort of things that a human might not notice either.

That said I do think you could get a lot more accuracy between the agent checking and running its own answers, and then also sending its diff to a very strong LLM like o3 or Gemini Pro 2.5 to review it - it's just a bit expensive to do that atm.

The main issue on real projects is that just having enough context to even approach problems, and build and run tests is very difficult when you have 100k+ lines of code and it takes 15 minutes to clean build and run tests. And it feels like we're still years away from having all of the above, plus a large enough context window that this is a non-issue, for a reasonable price.

cyanydeez•7mo ago
Like, its a nerd slot machine: shows you small wins, gets you almost big wins and seduces you into thinking "just one more perfect prompt and surely ill hit the jackpot"
stpedgwdgfhgdd•7mo ago
The recent developments are impressive. I’m now using my IDE as a diff viewer. Everything goes through the terminal. If there is an error, CC can analyse and fix it.

Still needs a lot of handholding. I do not (yet) think big upfront plans will suddenly start working in the enterprise world. Let it write a failing test first.

_dark_matter_•7mo ago
I'm still not convinced. I spent a few hours today trying to get it to add linting to a SQL repository, _given another repository that already had what I wanted_.

At one point it got a linting error and just added that error to the ignore list. I definitely I spent more time reviewing this code and prompting than it would have taken for me to do it myself. And it's still not merged!

ed_mercer•7mo ago
Are you saying TDD works best with CC? Write a failing test first? I read an article about that recently but can't find it...

EDIT: https://www.anthropic.com/engineering/claude-code-best-pract...

bgwalter•7mo ago
> What other tasks could be automated today with the current LLMs performance?

CEO speeches and pro-LLM blogs come to mind.

Again, there is a vague focus on "updating dependencies" where allegedly some time was saved. Take that to the extreme and we don't need any new software. Freeze Linux and Windows, do only security updates and fire everyone. Because the ultimate goal of LLM shills or self-hating programmers appears to be to eliminate all redundant work.

Be careful what you wish for. They won't reward you for shilling or automating, they'll just fire you.

msgodel•7mo ago
The primary use seems to be satisfying administrative demands that were never productive anyway.
Eddy_Viscosity2•7mo ago
This. They've been pushing these at my workplace and the only thing I can think to use it for is have the LLMs generate empty long-winded corporate-speak emails that I can send to managers when they ask for things that seem best answered by an empty long-winded corporate-speak email. Like "How are using using all these AI tools we are forcing on you without asking if you needed or wanted them?"
hammyhavoc•7mo ago
And so how exactly are you using them? ;- )
ajkjk•7mo ago
normal English would be "Why does Claude Code feel like magic?"

edit: or "Why Claude Code feels like magic" without the ?.

arpowers•7mo ago
Has anyone actually gotten productivity improvements from Claude Code?

What’s the use case?

(I tried some things, and it blew up. Thus far my experience w agents in general)

ryandvm•7mo ago
I have used it on a fairly simple Kotlin Android application and was blown away. I have previously been using paid ChatGPT, Github Copilot, and Gemini. In my opinion, it's the complete access to your repo that really makes it powerful, whereas with the other plugins you kind of have to manually feed it the files in your workspace and keep them in sync.

I asked it to add Google Play subscription support to my application and it did, it required minimal tweaking.

I asked it to add a screen for requesting location permissions from the user and it did it perfectly. No adjustment.

I also asked it add a query parameter to my API (GoLang) which should result in a subtle change several layers deep and it had no problems with that.

None of this is rocket science and I think the key is that it's all been done and documented a million times on the Internet. At this point, Claude Code is at least as effective as junior developer.

Yes, I understand that this is a Faustian bargain.

jki275•7mo ago
FYI -- Windsurf, Cline, Cursor will all do this also, using Claude models if you set them up that way.
anonzzzies•7mo ago
It gives us great productivity. If you write the tests yourself and insist it delivers 100% success without touching the tests themselves, just run them, it is very nice. We wrote a little bit of tooling around it so it instructs and loops until 100% succeed. Even for stuff that's complex enough for seniors to struggle (parsers/compilers), it delivers results after hours instead of days or weeks. But if you miss some tests you can all but guarantee that those things won't work even though an experienced human would automatically do that right as it is illogical for instance. But we would write tests like this for humans as well, so there is not much difference in our workflow; CC delivers faster and far far cheaper. And we tried it all, especially NOT having it integrated into an ide is brilliant. Before we used aider instead of cursor etc as we can control it: we don't want a human sitting there tapping 'yes, please do' or whatnot. We want it to finish, commit a PR and then review.
octo888•7mo ago
It's great at mocking up some HTML pages with eg Tailwind and static site generators. Give it some ideas, a bit of copy, a few colours and it'll create some pages filled with plausible sounding text. I can imagine using it in front of clients to give them an idea of what a new site could look like.

Easily adjusted with things like "the colour palette is a bit bright, use more pastels" or "make it more SEO friendly" and it often easily generates a large todo list/set of changes based on minimal input

My friend was mulling over a product concept and I used it to design a landing page and it helped her see how easily you can create a website to sell the product. It took ~15 minutes and I'm a web dev noob. (Obviously setting up a real ecommerce site is a little bit more work)

It makes sense it's good at HTML because of the huge body of public data available.

memorylane•7mo ago
I use CC in existing code bases to build out new GUI - VueJS/Quasar and it blows me away! For back end Rust code it excels at boilerplate crud handlers back to the db - it copies the style of existing code… I’ll happily pay for it if my boss does not, just work less hours…
datpuz•7mo ago
The productivity gains decrease with user experience. A high-performing senior engineer won't get a lot, but I think they've reached a point now where even seniors will benefit a fair amount. For me it's not really that they increase my productivity directly, but they let me offload a lot of the cognitive load. I'm getting a similar amount of work done and I don't feel as drained at the end of the day.
atlgator•7mo ago
I've been very successful pointing it to a backlog of manual test cases, using Playwright MCP to execute the test cases against dev as a black box, and generating the corresponding Playwright scripts to add to our automated test repo.

I had hired an actual automated tester with years of experience to write playwright scripts for us. After 3 months he had not produced a single passing test. I managed to build the entire scaffolding myself in 2 weeks having no prior playwright experience.

guluarte•7mo ago
in my experience using agents has just wasted my time and money, they are good for small things if you are lazy and watching a movie looking at the results every 10 minutes, reverting and trying again
vital_beach•7mo ago
I really enjoyed Claude Code. I was using it on some side projects for about a month with API credits, and I signed up for the Max subscription shortly after it started working with Code. Overnight, my account was banned, and I have no idea why.

It sucks getting banned from such a cool and helpful tool :(

bn-l•7mo ago
Did the program need to kill child processes a lot?
vital_beach•7mo ago
nope, just running and stopping dev servers. It may have done a pfkill once or twice if something was hanging?

Either way, using it was the API credits was fine for a little over a month, so I don't know if it was that. I got autobanned only a few hours after paying for Max and reauthing the client to use the subscription. My actual usage of it didn't change.

tbcj•7mo ago
I had two accounts banned - one for Claude and one for the API. I tried to appeal both asking for more information. The response from Anthropic was non-specific and only that it violates usage. One account had only been minimally used. One never used. The accounts used email addresses using a domain I control - e.g., anthropic-claude@domain.xyz for example. I think that might have something to do with it.

I have a new account now using a Google account and it hasn’t been banned.

datpuz•7mo ago
Wait 'til you see the bill.
volkk•7mo ago
there's no bill if you're paying the 200$/mo for unlimited use, right?
Tomte•7mo ago
It‘s not unlimited.
floydnoel•7mo ago
I was hitting throttling a lot on the $100 plan, but I haven't been throttled even once on the $200. so for me it's pretty unlimited. haven't gotten past using two agents at a time, though, so maybe that has something to do with it.
aussieguy1234•7mo ago
Usually, I'll go through my coding like I would have pre-LLMs.

Then, when I see something that looks like it can be reliably automated by an AI agent, I'll open up Cline and put Claude or Gemini Flash to work. This has a 90% success rate so far and has saved me hours of work.

sublinear•7mo ago
There is no long view on this crap.

You can think otherwise in a Western world that still imported illegal immigrants to pick your peaches.

Doesn't mean jack. Produce now or fuck off.