frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open models by OpenAI

https://openai.com/open-models/
1354•lackoftactics•8h ago•527 comments

Genie 3: A new frontier for world models

https://deepmind.google/discover/blog/genie-3-a-new-frontier-for-world-models/
1105•bradleyg223•11h ago•403 comments

Spotting base64 encoded JSON, certificates, and private keys

https://ergaster.org/til/base64-encoded-json/
216•jandeboevrie•5h ago•98 comments

Ollama Turbo

https://ollama.com/turbo
238•amram_art•6h ago•145 comments

Create personal illustrated storybooks in the Gemini app

https://blog.google/products/gemini/storybooks/
69•xnx•4h ago•25 comments

Consider using Zstandard and/or LZ4 instead of Deflate

https://github.com/w3c/png/issues/39
125•marklit•7h ago•70 comments

Claude Opus 4.1

https://www.anthropic.com/news/claude-opus-4-1
638•meetpateltech•8h ago•240 comments

Things that helped me get out of the AI 10x engineer imposter syndrome

https://colton.dev/blog/curing-your-ai-10x-engineer-imposter-syndrome/
694•coltonv•11h ago•532 comments

Scientific fraud has become an 'industry,' analysis finds

https://www.science.org/content/article/scientific-fraud-has-become-industry-alarming-analysis-finds
270•pseudolus•14h ago•230 comments

What's wrong with the JSON gem API?

https://byroot.github.io/ruby/json/2025/08/02/whats-wrong-with-the-json-gem-api.html
36•ezekg•4h ago•8 comments

The First Widespread Cure for HIV Could Be in Children

https://www.wired.com/story/the-first-widespread-cure-for-hiv-could-be-in-children/
61•sohkamyung•3d ago•11 comments

Ask HN: Have you ever regretted open-sourcing something?

108•paulwilsonn•3d ago•143 comments

uBlock Origin Lite now available for Safari

https://apps.apple.com/app/ublock-origin-lite/id6745342698
963•Jiahang•16h ago•383 comments

Kyber (YC W23) is hiring enterprise account executives

https://www.ycombinator.com/companies/kyber/jobs/6RvaAVR-enterprise-account-executive-ae
1•asontha•4h ago

Show HN: Stagewise (YC S25) – Front end coding agent for existing codebases

https://github.com/stagewise-io/stagewise
31•juliangoetze•10h ago•34 comments

Build Your Own Lisp

https://www.buildyourownlisp.com/
216•lemonberry•13h ago•58 comments

US reportedly forcing TSMC to buy 49% stake in Intel to secure tariff relief

https://www.notebookcheck.net/Desperate-measures-to-save-Intel-US-reportedly-forcing-TSMC-to-buy-49-stake-in-Intel-to-secure-tariff-relief-for-Taiwan.1079424.0.html
293•voxadam•7h ago•341 comments

Quantum machine learning via vector embeddings

https://arxiv.org/abs/2508.00024
8•adbabdadb•2h ago•0 comments

Los Alamos is capturing images of explosions at 7 millionths of a second

https://www.lanl.gov/media/publications/1663/dynamics-of-dynamic-imaging
104•LAsteNERD•10h ago•86 comments

Under the Hood of AFD.sys Part 1: Investigating Undocumented Interfaces

https://leftarcode.com/posts/afd-reverse-engineering-part1/
24•omegadev•2d ago•5 comments

The mystery of Winston Churchill's dead platypus was finally solved

https://www.bbc.com/news/articles/cglzl1ez283o
43•benbreen•2d ago•7 comments

Cannibal Modernity: Oswald de Andrade's Manifesto Antropófago (1928)

https://publicdomainreview.org/collection/manifesto-antropofago/
19•Thevet•2d ago•3 comments

AI is propping up the US economy

https://www.bloodinthemachine.com/p/the-ai-bubble-is-so-big-its-propping
111•mempko•5h ago•126 comments

No Comment (2010)

https://prog21.dadgum.com/57.html
60•ColinWright•10h ago•49 comments

Tell HN: Anthropic expires paid credits after a year

176•maytc•23h ago•87 comments

Cow vs. Water Buffalo Mozzarella

http://itscheese.com/reviews/mozzarella
18•indigodaddy•3d ago•17 comments

Eleven Music

https://elevenlabs.io/blog/eleven-music-is-here
163•meetpateltech•9h ago•203 comments

Apache ECharts 6

https://echarts.apache.org/handbook/en/basics/release-note/v6-feature/
261•makepanic•18h ago•30 comments

GitHub pull requests were down

https://www.githubstatus.com/incidents/6swp0zf7lk8h
113•lr0•9h ago•150 comments

Using Dspy to Detect Document Boundaries

https://kmad.ai/Using-DSPy-to-Detect-Document-Boundaries
4•aberoham•2d ago•1 comments
Open in hackernews

Claude Opus 4.1

https://www.anthropic.com/news/claude-opus-4-1
636•meetpateltech•8h ago

Comments

jasonlernerman•8h ago
Has anyone tested it yet? How's it acting?
smerrill25•8h ago
waiting for this, too.
usaar333•8h ago
No obvious gains I feel from quick chats, but too early to tell.

These benchmark gains aren't that high, so I doubt it is that obvious.

jedisct1•7h ago
Tested it on a refactor of Zig code. It worked fine, but was very slow.
minimaxir•8h ago
This likely won't move the needle for Opus use over Sonnet while the cost remains the same. Using OpenRouter rankings (https://openrouter.ai/rankings) as a proxy, Sonnet 3.7 and Sonnet 4 combined generates 17x more tokens than Opus 4.
qsort•8h ago
All three major labs released something within hours of each other. This anime arc is insane.
x187463•8h ago
Given the GPT5 rumors, August is just getting started.
kridsdale3•6h ago
Given the Gregorian Calendar and the planet's path through its orbit, August is just getting started.
tomrod•5h ago
This legitimately made me chuckle.
wunderg•1h ago
Good one, made my day
ozgung•8h ago
What a time to be alive
tonyhart7•7h ago
as if they wait competitor first then launch it at the same time to make market decide which one is best
torginus•5h ago
I think this means that GPT5 is better - you can't launch a worse model after the competitor supersedes you - you have to show that you're in the lead even if its just for a day.
rapind•3h ago
Not sure that this is true. Are there a lot of people waiting anxiously to adopt the next model on the day of release and expecting some huge work advantage?
azan_•3h ago
Absolutely.
dnh44•1h ago
If you’re using an LLM near the limits of what it can do then a small improvement in performance is noticeable.
vFunct•7h ago
None of them seem to have published any papers associated with them on how these new models advanced the state-of-the-art though. =^(
hugodan•3h ago
china will do that for them
candiddevmike•7h ago
It's definitely a coincidence
wilg•7h ago
It's not a coincidence or a cartel, it's PR counterprogramming.
BudaDude•2h ago
Agree 100%

If you look at the past, whenever Google announces something major, OpenAI almost always releases something as well.

People forget realize that OpenAI was started to compete with Google on AI.

Etheryte•6h ago
This is why you have PR departments. Being on top of the HN front page, news sites, etc matters a lot. Even if you can't be the first, it's important to dilute the attention as much as possible to reduce the limelight your competitors get.
paulryanrogers•1h ago
"Prep the next three point releases now, but don't release any until I say so. None needs to be noticably better or even different, just has to have a higher number." -CEO of AI companies
andai•25m ago
How do they know when it's time? Corporate espionage? Or do they just have Next Thing queued up months in advance and ready to go.
j45•4m ago
They likely sit on releases ready to go.
qoez•4m ago
There's so many leakers in every lab
steveklabnik•8h ago
This is the bit I'm most interested in:

> We plan to release substantially larger improvements to our models in the coming weeks.

machiaweliczny•8h ago
This is so people don't immediately migrate for GPT5
NitpickLawyer•8h ago
Cheekily announcing during oAI's oss model launch :D
haaz•8h ago
it is barely an improvement according to their own benchmarks. not saying thats a bad thing, but not enough for anybody to notice any difference
waynenilsen•8h ago
i think its probably mostly vibes but that still counts, this is not in the charts

> Windsurf reports Opus 4.1 delivers a one standard deviation improvement over Opus 4 on their junior developer benchmark, showing roughly the same performance leap as the jump from Sonnet 3.7 to Sonnet 4.

ttoinou•8h ago
That's why they named it 4.1 and not 4.5
zamadatix•7h ago
When it's "that's why they incremented the version by a tenth instead of a half" you know things have really started to slow for the large models.
phonon•7h ago
Opus 4 came out 10 weeks ago. So this is basically one new training run improvement.
zamadatix•7h ago
And in 52 weeks we've gone 3.5->4.1 with this training improvement, meanwhile the 52 weeks prior to that were Claude -> Claude 3. The absolute jumps per version delta also used to be larger.

I.e. it seems we don't get much more than new training run levels of improvement anymore. Which is better than nothing, but a shame compared to the early scaling.

globalise83•5h ago
Is it really a bigger jump to go from plausible to frequently useful, than from frequently useful to indispensable?
zamadatix•3h ago
Why is there supposed to be no step between frequently useful and indispensable? Quickly going from nothing to frequently useful (which involved many rapid hops between) was certainly surprising, and that's precisely the lost momentum.
mclau157•6h ago
They released this because competitors are releasing things
leetharris•8h ago
Good! I'm glad they are just giving us small updates. Opus 4 just came out, if you have small improvements, why not just release them? There's no downside for us.
AstroBen•8h ago
I don't think this could even be called an improvement? It's small enough that it could just be random chance
j_bum•7h ago
I’ve always wondered about this actually. My assumption is that they always “pick the best” result from these tests.

Instead, ideally they’d run the benchmark tests many times, and share all of the results so we could make statistical determinations.

gloosx•6h ago
They need to leave some room to release 10 more models. They could crank benchmarks to 100% but then no new model is needed lol? Pretty sure these pretty benchmark graphs are all completely staged marketing numbers since they do solve the same problems they are being trained on – no novel or unknown problematic is presented to them.
levocardia•5h ago
"You pay $20/mo for X, and now I'm giving you 1.05*X for the same price." Outrageous!
onlyrealcuzzo•5h ago
I will only add that it's interesting that in the results graphic, they simply highlighted Opus 4.1 - choosing not to display which models have the best scores - as Opus 4.1 only scored the best on about half of the benchmarks - and was worse than Opus 4.0 on at least one measure.
Topfi•4h ago
I am still very early, but output quality wise, yes, there does not seem to be any noticeable improvement in my limited personal testing suite. What I have noticed though is subjectively better adherence to instructions and documentation provided outside the main prompt, though I have no way to quantify or reliably test that yet. So beyond reliably finding Needles-in-the-Haystack (which Frontier models have done well on lately), Opus 4.1 seems to do better in following those needles even if not explicitly guided to compared to Opus 4.
jzig•8h ago
I'm confused by how Opus is presented to be superior in nearly every way for coding purposes yet the general consensus and my own experience seem to be that Sonnet is much much better. Has anyone switched to entirely using Opus from Sonnet? Or maybe switching to Opus for certain things while using Sonnet for others?
datameta•8h ago
I now eagerly await Sonnet 4.1, only because of this release.
rtfeldman•8h ago
Yes, Opus is very noticeably better at programming in both Rust and Zig in my experience. I wish it were cheaper!
MostlyStable•8h ago
Opus seems better to me on long tasks that require iterative problem solving and keeping track of the context of what we have already tried. I usually switch to it for any kind of complicated troubleshooting etc.

I stick with Sonnet for most things because it's generally good enough and I hit my token limits with it far less often.

unshavedyak•8h ago
Same. I'm on the $200 plan and I find Opus "better", but Sonnet is more straight forward. Sonnet is, to me, a "don't let it think" model. It does great if you give it concrete and small goals. Anything vague or broad and it starts thinking and it's a problem.

Opus gives you a bit more rope to hang yourself with imo. Yes, it "thinks" slightly better, but still not good enough to me. But it can be good enough to convince you that it can do the job.. so i dunno, i almost dislike it in this regard. I find Sonnet just easier to predict in this regard.

Could i use Opus like i do Sonnet? Yes definitely, and generally i do. But then i don't really see much difference since i'm hand-holding so much.

adastra22•8h ago
Every time that Sonnet is acting like it has brain damage (which is once or twice a day), I switch to Opus and it seems to sort things out pretty fast. This is unscientific anicdata though, and it could just be that switching models (any model) would have worked.
anonzzzies•8h ago
Exactly that.
j45•8h ago
They both seem to behave differently depending on how loaded the system seems to be.
api•8h ago
I have suspected for a long time that hosted models load shed by diverting some requests to lesser models or running more quantized versions under high load.
parineum•7h ago
I think OpenRouter saves tokens by summarizing queries through another model, IIRC.
monatron•8h ago
This is a great use case for sub-agents IMO. By default, sub-agents use sonnet. You can have opus orchestrate the various agents and get (close to) the best of both worlds.
adastra22•6h ago
Is there a way to get persistent sub-agents? I'd love to have a bunch of YAML files in my repository, one for each sub-agent, and have those automatically used across all Claude Code instances I have on multiple machines (I dev on laptop and desktop), or across the team.
mwigdahl•5h ago
Yep: https://docs.anthropic.com/en/docs/claude-code/sub-agents
rapind•3h ago
In this case I don't think the controller needs to be the smartest model. I use sonnet as the main driver and pass the heavy thinking (via zen mcp) onto Gemini pro for example, but I could use openai or opus or all of them via OpenRouter.

Subagents seem pretty similar to using zen mcp w/ OpenRouter but maybe better or at least more turnkey? I'll be checking them out.

mark_undoio•2h ago
Amp (ampcode.com) uses Sonnet as its main model and has GPT o3 as a special purpose tool / subagent. It can call into that when it needs particularly advanced reasoning.

Interestingly I found that prompting it to ask the o3 submodel (which they call The Oracle) to check Sonnet's working on a debugging solution was helpful. Extra interesting to me was the fact that Sonnet appeared to do a better job once I'd prompted that (like chain of thought prompting, perhaps asking it to put forward an explanation to be checked actually triggered more effective thinking).

riwsky•16m ago
Great, now even computers need to leave the IC track if they want continued career progression.
gpm•7h ago
This seems like a case of reversion to the mean. When one model is performing below average, changing anything (like switching to another model) is likely to improve it by random chance...
keeeba•4h ago
Anthropic say Opus is better, benchmarks & evals say Opus is better, Opus has more parameters and parameters determine how much a NN can learn.

Maybe Opus just is better

HarHarVeryFunny•7h ago
Maybe context rot? If model's output seems to be getting worse or in a rut, then try just clearing context / starting a new session.
adastra22•6h ago
Switching models with the same context, in this case.
dested•8h ago
If I'm using cursor then sonnet is better, but in claude code Opus 4 is at least 3x better than Sonnet. As with most things these days, I think a lot of it comes down to prompting.
jzig•8h ago
This is interesting. I do use Cursor with almost exclusively Sonnet and thinking mode turned on. I wonder if what Cursor does under the hood (like their indexing) somehow empowers Sonnet more. I do not have much experience with using Claude Code.
seunosewa•8h ago
It's ridiculously overpriced in the API. Just like o3 used to be.
brenoRibeiro706•8h ago
I feel the same way. I usually use Opus to help with coding and documentation, and I use Sonnet for emails and so on.
biinjo•7h ago
Im on the Max plan and generally Opus seems to do better work than Sonnet. However, that’s only when they allow me to use Opus. The usage limits, even on the max plan, are a joke. Yesterday I hit the limits within MINUTES of starting my work day.
epolanski•7h ago
Yeah, you need to actively cherry pick which model to use in order to not waste tokens on stuff that would be easily handed by a simpler model.
furyofantares•7h ago
I'm a bit confused by people hitting usage limits so quickly.

I use Opus exclusively and don't hit limits. ccusage reports I'm using the API-equivalent of $2000/mo

rirze•7h ago
You always have to ask which plan they're paying for. Sometimes people complain about the $20 per month plan...
stavros•7h ago
There's no Opus quota on that plan at all.
furyofantares•6h ago
In this case I'm replying to someone who lead with "I'm on the Max plan" but I realize now that's ambiguous, maybe they are on 5x while I'm on 20x.
Bolwin•6h ago
That's insane. Are you accounting for caching? If not, there's no way this is going to last
furyofantares•6h ago
I'm using ccusage to get the number, I think it just looks at your history and calculates based on tokens vs API pricing. So I think it wouldn't account for caching.

But I totally agree there's no way it lasts. I'm mostly only using this for side projects and I'm sitting there interacting with it, not YOLO'ing, I do sometimes have two sessions going at the same time but I'm not firing off swarms or anything crazy. Just have it set to Opus and I chat with it.

Aeolun•1h ago
Claude Code definitely reports cached tokens, and I think CCusage does too, so it wouldn’t make sense for the calculation to be based on full pricing when they have the cached values.
dsrtslnd23•5h ago
same here constantly hit the Opus limits after minutes on Max plan
Aeolun•1h ago
Is this on x5? Because ever since they booted all the freeloaders I’ve not once seen the “you are approaching usage limits” message. Anyway, the “you are approaching usage limits” message shows up when you are over 50% of your tokens for that timeframe, so it’s not sure useful.
gpm•7h ago
I notice that on the "Agentic Coding" benchmark cited in the article Sonnet 4 outperformed Opus 4 (by 0.2%), and under performs Opus 4.1 (by -1.8%).

So this release might change that consensus? If you believe the benchmarks are reflective of reality anyways.

jimbo808•7h ago
> If you believe the benchmarks are reflective of reality anyways.

That's a big "if." But yeah, I can't tell a difference subjectively between Opus and Sonnet, other than maybe a sort of placebo effect. I'm more careful to write quality prompts when using Opus, because I don't want to waste the 5x more expensive tokens.

Uehreka•7h ago
> yet the general consensus and my own experience seem to be that Sonnet is much much better

Given that there’s nothing close to scientific analysis going on, I find it hard to tell how big the “Sonnet is overall better, not just sometimes” crowd is. I think part of the problem is that “The bigger model is better” feels obvious to say, so why say it? Whereas “the smaller model is better actually” feels both like unobvious advice and also the kind of thing that feels smart to say, both of which would lead to more people who believe it saying it, possibly creating the illusion of consensus.

I was trying to dig into this yesterday, but every time I come across a new thread the things people are saying and the proportions saying what are different.

I suppose one useful takeaway is this: If you’re using Claude Max and get downgraded from Opus to Sonnet for a few hours, you don’t have to worry too much about it being a harsh downgrade in quality.

taormina•7h ago
Just more ancedata, but I entirely agree. I can't say that I am happy with Sonnet's output at any point, really, but it still occasionally works, whereas Opus has been a dumpster fire every single time.
SkyPuncher•7h ago
I don't doubt Opus is technically superior, but it's not practically superior for me.

It's still pretty much impossible to have any LLM one-shot a complex implementation. There's just too many details to figure out and too much to explain for it to get correct. Often, there's uncertainty and ambiguity that I only understand the correct answer (or rather less bad answer) after I've spent time deep in the code. Having Opus spit out a possibly correct solution just isn't useful to me. I need to understand _why_ we got to that solution and _why_ it's a correct solution for the context I'm working in.

For me, this means that I largely have an iteratively driven implementation approach where any particular task just isn't that complex. Therefore, Sonnet is completely sufficient for my day-to-day needs.

ssk42•5h ago
You can also always have it create design docs and mermaid diagrams for each task. Outline the why much easier earlier, shifting left
bdamm•4h ago
I've been having a great time with Windsurf's "Planning" feature. Have a nice discussion with Cascade (Claude) all about what it is that neerds to happen - sometimes a very long conversation including test code. Then when everything is very clear, make it happen. Then test and debug the results with all that context. Pretty nice.
jstummbillig•3h ago
Can you explain what you do exactly? Do you enable plan mode and use with chat...?
trenchpilgrim•1h ago
In Zed I switch the AI panel to ask mode and chat with the agent about different approaches and have it draft patches. Then when I think there's a design worth trying, switch to Write mode and have it implement that change + run the tests and diagnostics to verify the code at least compiles, tests pass and follows our style guides. Finally a line by line review + review of the test coverage (in terms of interface surface area) before submitting a PR for another human review.
Larrikin•1h ago
After watching a few videos trying to understand how people were using LLMs and getting useful results I found that even making a simpler version of the fancy planning mode in the LLM IDEs via the instructions.md produced hugely better productivity gains.

I started adding an instruction file along the lines of "Always tell me your plan to solve the issue first with short example code, never edit files without explicit confirmation of your plan" at the start and it is like a day and night difference in how useful it becomes. It also starts to feel like programming again where you can read through various files and instead of thinking in your head, you write out your thoughts. You end up getting confirmation or push back on errors that you can clean up.

Reading through a sort of wrong sort of right implementation spread across various files after every prompt just really sucked.

I'm not one shotting massive amounts of files, but I am enjoying the lack of grunt work.

jm4•7h ago
I use both. Sonnet is faster and more cost efficient. It's great for coding. Where Opus is noticeably better is in analysis. It surpasses Sonnet for debugging, finding patterns in data, creativity and analysis in general. It doesn't make a lot of sense to use Opus exclusively unless you're on a max20 plan and not hitting limits. Using Opus for design and troubleshooting and Sonnet for everything else is a good way to go.
astrostl•7h ago
With aggressive Claude Code use I didn't find Sonnet better than Opus but I did find it faster while consuming far fewer tokens. Once I switched to the $100 Max plan and configured CC to exclusively use Sonnet I haven't run into a plan token limit even once. When I saw this announcement my first thing was to CMD-F and see when Sonnet 4.1 was coming out, because I don't really care about Opus outside of interactive deep research usage.
ssss11•3h ago
That’s very strange. Sonnet is hot garbage and Opus is a miracle, for me. I also don’t see anyone praising sonnet anywhere.
sky2224•2h ago
I've found with limited context provided in your prompt, opus is just awful compared to even gpt-4.1, but once I give it even just a little bit more of an explanation, it jumps leagues ahead.
sothatsit•2h ago
Opus really shines for completing long-running tasks with no supervision. But if you are using Claude Code interactively and actively steering it yourself, Sonnet is good enough and is faster.

I don't believe anyone saying Sonnet yields better results than Opus though, as my experience has been exactly the opposite. But trade-off wise, I can definitely see it being a better experience when used interactively because of its speed and lower cost.

Aeolun•1h ago
My opinion of Opus is that it takes the correct action 19/20 times, where Sonnet takes the correct action 18/20 times. It’s not strictly necessary to use Opus, but if you have the subscription already it’s just a pure win.
paxys•8h ago
Why is everything releasing today?
datameta•8h ago
Could it be nobody wanted to be first and overshadowed, nor the only one left out - and it cascaded after the first announcement? My first hunch, though, was that it had been agreed upon. Game theory I think tells us that releasing same day in the pattern ABC BCA CAB etc would be lowest risk and highest average gain?
highfrequency•6h ago
If they release before GPT-5, they don't have to compare to GPT-5 in their benchmarks. It's a big PR win to be able to plausibly claim that your model is the best coding model at the time of release.
gusmally•8h ago
They restarted Claude Plays Pokemon with the new model: https://www.twitch.tv/claudeplayspokemon

(He had been stuck in the Team Rocket hideout (I believe) for weeks)

alrocar•7h ago
just ran the LLM to SQL benchmark over opus-4.1 and it didn't top previous version :thinking: => https://llm-benchmark.tinybird.live/
epolanski•7h ago
How does running it multiple times performs?

LLMs are non-deterministic, I think benchmarks should be more about averages of N runs, rather than single shot experiments.

jedisct1•7h ago
Is it just me or is it super slow?
taormina•7h ago
Alright, well, Opus 4.1 seems exactly as useless as Opus 4 was, but it's probably eating my tokens faster. Wish they let you tell somehow.

At least Sonnet 4 is still usable, but I'll be honest, it's been producing worse and worse slob all day.

I've basically wasted the morning on Claude Code when I should've just been doing it all myself.

AlecSchueler•4h ago
I've also noticed Sonnet starting to degrade. It's developing some of the behaviours that put me off the competition in the first place. Needless explanations, filler in responses, wanting to put everything in lists, even increased sycophancy.
bavell•4h ago
> I've basically wasted the morning on Claude Code when I should've just been doing it all myself.

Welcome to the machine

https://www.youtube.com/watch?v=tBvAxSx0nAM&t=45s

Aeolun•1h ago
I feel like this is just related to my projects getting bigger. Claude Code is trying to keep up with my project evolving from 2k lines of code to 100k lines. Of course it’s going to feel worse.
UncleEntity•48m ago
Other than it starting out trying to produce a full and complete web app (or whatever) for my daily yak shaving session instead of the normal "let's talk about and work through this thing" the new Opus 4.1 seems to 'get it' a lot quicker than the old daffy robot did. It asked pertinent questions to understand the system we are working on and accomplished the goal of updating the design document so I don't have to keep explaining details at the start of every chat session. Something, by the way, it always previously failed to do causing me to have to explain stuff each and every time before forward progress could be made.

I do agree it did hit the token limit a lot quicker than before where I could chat for hours without worrying about it.

Either way, still have one last yak to shave for this project so we'll see how efficient it is with that. If it accomplishes the task before burning through all the tokens then win, win, I suppose.

rvz•7h ago
Notice how Anthropic has never open sourced any of their models.

This makes them (Anthropic) worse than OpenAI in terms of openness.

Since in this case as we all know. [0]

"What will permanently change everything is open source and transparent AI models that are smaller and more powerful than GPT-3 or even GPT-4."

[0] https://news.ycombinator.com/item?id=34865626

jjani•7h ago
On the other hand, they have always exposed their raw chain of thought, so you know exactly what you're paying for, unlike OpenAI who hides it. Similarly they allow an actual thinking budget rather than vague "low, medium, high", again unlike OpenAI. They also allow API access to all their models without draconic send-us-your-personal-data-KYC, once more unlikely OpenAI.

They might not fit your personal definition of "openness", but they do fit many other equally valid interpretations of that contept.

ryandrake•7h ago
Am I the only one super confused about how to even get started trying out this stuff? Just so I wouldn't be "that critic who doesn't try the stuff he criticizes," I tried GitHub Copilot and was kind of not very impressed. Someone on HN told me Copilot sucks, use Claude. But I have no idea what the right way to do it is because there are so many paths to choose.

Let's see: we have Claude Code vs. Claude the API vs. Claude the website, and they're totally different from each other? One is command line, one integrates into your IDE (which IDE?) and one is just browser based, I guess. Then you have the different pricing plans, Free, Pro, and Max? But then there's also Claude Team and Claude Enterprise? These are monthly plans that only work with Claude the Website, but Claude Code is per-request? Or is it Claude API that's per-request? I have no idea. Then you have the models: Claude Opus and Claude Sonnet, with various version numbers for each?? Then there's Cline and Cursor and GOOD GRIEF! I just want to putz around with something in VSCode for a few hours!

adamors•7h ago
Download Cursor and try it through that, IMO that's currently the most polished experience especially since you can change models on the fly. For more advanced usecases, CLI is better but for getting your feet wet I think Cursor is the best choice.
ryandrake•6h ago
Thanks. Too bad you need to switch editors to go that path. I assume the Cursor monthly plans are not the same as the Claude monthly plans and you can't use one for the other if you want to experiment...
kingnothing•6h ago
Cursor is built on VSCode.
olalonde•7h ago
Claude Code CLI.
ryandrake•7h ago
Thanks. With the CLI, can you get Copilot-ish things like tab-completion and inline commands directly in your IDE? Or do you need to copy/paste to and from a terminal? It feels like running a command on the IDE and then copying the output into your IDE is a pretty primitive way to operate.
cultureulterior•6h ago
Claude does the coding, and edits your files. You just sit back and relax. You don't do any tab completion etc.
avemg•6h ago
My advice is this:

1) Completely separate in your mind the auto-completion features from the agentic coding features. The auto-completion features are a neat trick but I personally find those to be a bit annoying overall, even if they sometimes hit it completely right. If I'm writing the code, I mostly don't want the LLM autocompletion.

2) Pay the $20 to get a month of Claude Pro access and then install Claude Code. Then, either wait until you have a small task in mind or your stuck on some stupid issue that you've been banging your head on and then open your terminal and fire up Claude Code. Explain to it in plain English what you want it to do. Pretend it's a colleague that you're giving a task to over Slack. And then watch it go. It works directly on your source code. There is no copying and pasting code.

3) Bookmark the Claude website. The next time you'd Google something technical, ask it Claude instead. General questions like "how does one typically implement a flizzle using the floppity-do framework"? "I'm trying to accomplish X, what are my options when using this stack?". General questions like that.

From there you'll start to get it and you'll get better at leverage the tool to do what you want. Then you can branch out the rest of the tool ecosystem.

ryandrake•6h ago
Interesting about the auto-completion. That was really the only Copilot feature I found to be useful. The idea of writing out an English prompt and telling Copilot what to write sounded (and still sounds) so slow and clunky. By the time I've articulated what I want it to do, I might as well have written the code myself. The auto-completion was at least a major time-saver.

"The card game state is a structure that contains a Deck of cards, represented by a list of type Card, and a list of Players, each containing a Hand which is also a list of type Card, dealt randomly, round-robin from the Deck object." I could have input the data structure and logic myself in the amount of time it took to describe that.

avemg•6h ago
I think you should embrace a bit of ambiguity. Don't treat this like a stupid computer where you have to specify everything in minute detail. Certainly the more detail you give, the better to an extent. But really: Treat it like you're talking to a colleague and give it a shot. You don't have to get it right on the first prompt. You see what it did and you give it further instructions. Autocomplete is the least compelling feature of all of this.

Also, I don't remember what model Copilot uses by default, especially the free version, but the model absolutely makes a difference. That's why I say to spend the $20. That gives you access to Sonnet 4 which is where, imo, these models took a giant leap forward in terms of quality of output.

ryandrake•6h ago
Thanks, I shall give it a try.
rstupek•4h ago
Is Opus as big a leap as sonnet4 was?
stillpointlab•6h ago
One analogy I have been thinking about lately is GPUs. You might say "The amount of time it takes me to fill memory with the data I want, copy from RAM to the GPU, let the GPU do it's thing, then copy it back to RAM, I might as well have just done the task on the CPU!"

I hope when I state it that way you start to realize the error in your thinking process. You don't send trivial tasks to the GPU because the overhead is too high.

You have to experiment and gain experience with agent coding. Just imagine that there are tasks where the overhead of explaining what to do and reviewing the output are dwarfed by the actual implementation. You have to calibrate yourself so you can recognize those tasks and offload them to the agent.

potatolicious•6h ago
There's a sweet spot in terms of generalization. Yes, painstakingly writing out an object definition in English just so that the LLM can write it out in Java is a poor use of time. You want to give it more general tasks.

But not too general, because then it can get lost in the sauce and do something profoundly wrong.

IMO it's worth the effort to know these tools, because once you have a more intuitive sense for the right level of abstraction it really does help.

So not "make this very basic data structure for me based on my specs", and more like "rewrite this sequential logic into parallel batches", which might take some actual effort but also doesn't require the model to make too many decisions by itself.

It's also pretty good at tests, which tends to be very boilerplate-y, and by default that means you skip some cases, do a lot of brain-melting typing, or copy-and-paste liberally (and suffer the consequences when you missed that one search and replace). The model doesn't tire, and it's a simple enough task that the reliability is high. "Generate test cases for this object, making sure to cover edges cases A, B, and C" is a pretty good ROI in terms of your-time-spent vs. results.

collinvandyck76•7h ago
Claude Code is the superior interface in my opinion. Definitely start there.
Filligree•7h ago
You need Claude Pro or Max. The website subscription also allows you to use the command line tool—the rate limits are shared—and the command line tool includes IDE integration, at least for VSCode.

Claude Code is currently best-in-class, so no point in starting elsewhere, but you do need to read the documentation.

wahnfrieden•5h ago
Correct. Claude Code Max with Opus. Don’t even bother with Sonnet.
kelnos•3h ago
I wouldn't be too prescriptive. I have Pro, and it's fine. I'm not an incredibly heavy user (yet?); I've hit the rate limits a couple times, but not to the point where I'm motivated to spend more.

I haven't tried it myself, but I've heard from people that Opus can be slow when using it for coding tasks. I've only been using Sonnet, and it's performed well enough for my purposes.

Filligree•2h ago
Sonnet works fine in many cases. Opus is smarter, and custom 'agents' can be set to use either.

I prefer configuring it to use Sonnet for things that don't require much reasoning/intelligence, with Opus as the coordinator.

wahnfrieden•1h ago
Opus is slow, so sessions should be used in parallel, likely across work trees. You shouldn't sit and wait on an Opus agent.
47282847•1h ago
> You need Claude Pro or Max.

Actually, to try it out, prepaid token billing is fine. You are not required to have a subscription for claude code cli. Even just $5 gave me enough breathing room to get a feeling for its potential, personally. I do not touch code often these days so I was relieved not to have to subscribe and cancel again just to play around a little and have it write some basic scripts for me.

vlade11115•7h ago
Claude Code has two usage modes: pay-per-token or subscription. Both modes are using API under the hood, but with subscription mode you are only paying a fixed amount a month. Each subscription tier has some undisclosed limits, cheaper plans have lower usage limits. So I would recommend paying $20 and trying the Claude Code via that subscription.
dennisy•6h ago
No Opus in the $20 tier though sadly
oblio•6h ago
What does Opus do extra?
lxgr•6h ago
It's a much larger, more capable LLM than Claude Sonnet.
andyferris•2h ago
As far as I can tell - that seems to have changed today!
kace91•6h ago
I’m looking for cursor alternatives after confusing pricing changes. Is Claude code an option? Can be integrated on an editor/ide for similar results?

My use case so far is usually requesting mechanic work I would rather describe than write myself like certain test suites, and sometimes discovery on messy code bases.

andyferris•2h ago
Claude Code is really good for this situation.

If you like an IDE, for example VS Code you can have the terminal open at the bottom and run Claude Code in that. You can put your instructions there and any edits it makes are visibile in the IDE immediately.

Personally I just keep a separate terminal open and have the terminal and VSCode open on two monitors - seems to work OK for me.

prinny_•7h ago
What exactly did you try with GitHub copilot? It’s not an LLM itself, just in interface for an LLM. I have copilot in my professional GitHub account and I can choose between chat-gpt and Claude.
AlecSchueler•6h ago
I'm not sure what's complicated about what you're describing? They offer two models and you can pay more for higher usage limits, then you can choose if you want to run it in your browser or in your terminal. Like what else would you expect?

Fwiw I have a Claude pro plan and have no interest in using other offerings so I'm not sure if they're super simple (one model, one interface, one pricing plan)?

onlyrealcuzzo•6h ago
When people post this stuff, it's like, are you also confused that Nike sells shoes AND shorts AND shirts, and there's different colors and skus for each article of clothing, and sometimes they sell direct to consumer and other times to stores and to universities, and also there's sales and promotions, etc, etc?

It's almost as if companies sell more than one product.

Why is this the top comment on so many threads about tech products?

Imustaskforhelp•6h ago
Because I think that claude has gone beyond tech niche at this point..

Or maybe that's me, but still whether its through the likes of those vibe coding apps like lovable bolt etc.

at the end of the day, Most people are using the same tool which is claude since its mostly superior in coding (questionable now with oss models, but I still use it through kiro).

People expect this stuff to be simple when in reality its not and there is some frustation I suppose.

furyofantares•6h ago
In this case, they tried something and were told they were doing it wrong, and they know there's more than one way to do it wrong - wrong model, wrong tool using the model, wrong prompting, wrong task that you're trying to use it for.

And of course you could be doing it right but the people saying it works great could themselves be wrong about how good it is.

On top of that it costs both money and time/effort investment to figure out if you're doing it wrong. It's understandable to want some clarity. I think it's pretty different from buying shoes.

AlecSchueler•6h ago
Is it though? People complain about sore feet and hear they wear the wrong kind of shoes so they go to the store where they have to spend money to find out while trying to navigate between dress shoes, minimal shoes, running shoes, hiking shoes etc etc., they have to know their size, ask for assistance in trying them on...
evilduck•6h ago
> I think it's pretty different from buying shoes.

Shoe shopping is pretty complex, more so than trialing an AI model in my opinion.

Are you a construction worker, a banker, a cashier or a driver? Are you walking 5 miles everyday or mostly sedentary? Do you require steel toed shoes? How long are you expecting them to last and what are you willing to pay? Are you going to wear them on long runs or take them river kayaking? Do they need to be water resistant, waterproof or highly breathable? Do you want glued, welted, or stitch down construction? What about flat feet or arch support? Does shoe weight matter? What clothing are you going to wear them with? Are you going to be dancing with them? Do the shoes need a break in period or are they ready to wear? Does the available style match your preferences? What about availability, are you ok having them made to order or do you require something in stock now?

By comparison I can try 10 different AI services without even needing to stand up for a break while I can't buy good dress shoes in the same physical store as a pair of football cleats.

kelnos•4h ago
> Shoe shopping is pretty complex, more so than trialing an AI model in my opinion.

Oh c'mon, now you're just being disingenuous, trying to make an argument for argument's sake.

No, shoe shopping is not more complicated than trialing a LLM. For all of those questions about shoes you are posing, either a) a purchaser won't care and won't need to ask them, or b) they already know they have specific requirements and will know what to ask.

With an LLM, a newbie doesn't even know what they're getting into, let alone what to ask or where to start.

> By comparison I can try 10 different AI services without even needing to stand up for a break

I can't. I have no idea how to do that. It sounds like you've been following the space for a while, and you're letting your knowledge blind you to the idea that many (most?) people don't have your experience.

UncleEntity•27m ago
Just play with the 'free tier' on whatever website does the AI thing and figure it out.

Maybe there's a need to try ten different ones but I just stuck with one and can now convince it to do what I want it to do pretty successfully.

UncleEntity•37m ago
Ya know, in the over half a century I've been on this planet, choosing a new pair of shoes is so low on my 'life's little annoyances' list that it doesn't even rise above the noise of all the stupid random things which actually do annoy me.

Maybe the problem is I don't take shoes seriously enough? Something to work on...

ryandrake•6h ago
Hey, I'm open to the idea that I'm just stupid. But, if people in your target market (software developers) don't even understand your product line and need a HOWTO+glossary to figure it out, maybe there's also a branding/messaging/onboarding problem?
DougBTX•4h ago
My hot take is that your friend should show you what they’re using, not just dismiss Copilot and leave you hanging!
gmueckl•6h ago
When you walk into a store, you can see and touch all of these products. It's intuitive.

With all this LLM cruft all you get is essentially the same old chat interface that's like the year 2000 called and wants its on-line chat websites back. The only thing other than a text box that you usually get is a model selector dropdown squirreled away in a corner somewhere. And that dropdown doesn't really explain the differences between the cryptic sounding options (GPT-something, Claude Whatever...). Of course this confuses people!

derefr•6h ago
Claude.ai, ChatGPT, etc. are finished B2C products. They're black boxes, encapsulated experiences. Consumers don't want to pick a model, or know what model they're using; they just want to "talk to AI", and for the system to know which model is best to answer any given question. I would bet that for these companies, if their frontend observes you using the little model override button, that gets instrumented as an "oops" event in their metrics — something they aim to minimize.

What you're looking for, are the landing pages of the B2B API products underlying these B2C experiences. That would be https://www.anthropic.com/claude, https://openai.com/api/, etc. (In general, search "[AI company] API".)

From those B2B landing pages, you can usually click through to pages with details about each of their models.

Here's the model page corresponding to this news announcement, for example: https://www.anthropic.com/claude/opus

(Also, note how these B2B pages are on the AI companies' own corporate domains; whereas their B2C products have their own dedicated domains. From their perspective, their B2C offerings are essentially treated as separate companies that happen to consume their APIs — a "reference use-case" — rather than as a part of what the B2B company sells.)

margalabargala•6h ago
If anything, Anthropic has the product lineup that makes the most sense. Higher numbers mean better model. Haiku < Sonnet < Opus which translates to length/size. Free < Pro < Max.

Contrast to something like OpenAI. They've got gpt4.1, 4o, and o4. Which of these are newer than one another? How do people remember which of o4 and 4o are which?

hvb2•6h ago
Not sure is this is sarcasm I'm assuming not.

You're comparing well understood products that are wildly different to products with code names. Even someone who has never wore a t-shirt will see it on a mannequin and know where it goes.

I'm sorry but I cannot tell what the difference is between sonnet and opus. Unless one is for music...

So in this case you read the docs. Which is, in your analogy, you going to the Nike store and reading up on if a tshirt goes on your upper or lower body.

potatolicious•6h ago
Eh, this seems like a take that reeks a bit of "everyone is stupid except me".

I do know the answer to OP's question but that's because I pickle my brain in this stuff. It is legitimately confusing.

The analogy to different SKUs strikes me also inaccurate. This isn't the difference between shoes, shirts, and shorts - it's more as if a company sells three t-shirts but you can't really tell what's different about them.

It's Claude, Claude, and Claude. Which ones code for you? Well, actually, all of them (Code, web/desktop Claude, and the API can all do this)

Which ones do you ask about daily sundry queries? Well, two of them (web/desktop Claude, but also the API, but not Code). Well, except if your sundry query is about a programming topic, in which case Code can also do that!

Ok, if I do want to use this to write code, which one should I use? Honestly, any of them, and the company does a poor job of explaining why you would use each option.

"Which of these very similar-seeming t-shirts should I get?" "You knob. How are posts like this even being posted." is just an extremely poor way to approach other people, IMO.

ryandrake•5h ago
> It's Claude, Claude, and Claude. Which ones code for you?

Thanks for articulating the confusion better than I could! I feel it's a similar branding problem as other tech companies have: I'm watching Apple TV+ on my Apple TV software running on my Apple TV connected to my Google TV that isn't actually manufactured by Google. But that Google TV also has an Apple TV app that can play Apple TV+.

potatolicious•5h ago
It's a bit worse than a branding problem honestly, since there's legitimate overlap between products, because ultimately they're different expressions of the same underlying LLMs.

I'm not sure if you ever got a good rundown, but the tl;dr is that the 3 products ("Desktop", Code, and API) all expose the same underlying models, but are given different prompts, tools, and context management techniques that make them behave fairly differently and affect how you interact with them.

- The API is the bare model itself. It has some coding ability because that's inherent to the model - you can ask it to generate code and copy and paste it for example. You normally wouldn't use this except that if you're using some Copilot-type IDE integration where the IDE is doing the work of talking to the model for you and integrating it into your developer experience. In that case you provide API key and the IDE does the heavy lifting.

- The desktop app is actually a half-decent coder. It's capable of producing specific artifacts, distinguishing between multiple "files" it's writing for you, and revisiting previously-written code. "Oh, actually rewrite this in Go." is for example a thing it can totally do. I find it useful for diagnosing issues interactively.

- "Claude Code" is a CLI-only wrapper around the model. Think of it like Anthropic's first-party IDE integration, except there's not an IDE, just the CLI. In this case the integration gives the tool broad powers to actually navigate your filesystem, read specific files, write to specific files, run shell commands like builds and tests, etc. These are all functions that an IDE integration would also give you, but this is done in a Claude-y way.

My personal take is: try Claude Code, since as long as you're halfway comfortable with a CLI it's pretty usable. If you really want a direct IDE integration you can go with the IDE+API key route, though keep in mind that you might end up paying more (Claude Code is all-you-can-eat-with-rate-limits, where API keys will... just keep going).

ryandrake•5h ago
Wow. After 50 replies to what I thought wasn't such a weird question, your rundown is the most enlightening. Thank you very much.
Karrot_Kream•4h ago
FWIW it's probably because a lot of us have been following along and trying these things from the start so the nuances seem more obvious but also I feel that some folks feel your question is a bit "stupid", like "why are you suddenly interested in the frontier of these models? where were you for the last 2 years?"

And to some extent it is like the PC race. Imagine going to work and writing software for whatever devices your company writes software for in whatever toolchain your company uses. Then 2-3 years after the PC race began heating up, asking "Hey I only really write code for whatever devices my employer gives me access to. Now I want to buy one of these new PCs but I don't really understand why I'd choose an Intel over a Motorolla chipset or why I'd prioritize more ROM or more RAM, and I keep hearing about this thing called RISC that's way better than CISC and some of these chips claim to have different addressing modes that are better?"

slackpad•4h ago
Claude Code running in a terminal can connect to your IDE so you can review its proposed changes there. I’ve found this to be a nice drop in way to try it out without having to change your core workflow and tools too much. Check out the /ide command for details.
Karrot_Kream•4h ago
Also when it comes to API integrations, I find some better than others. Copilot has been pretty crummy for me but Zed's Agent Mode seems to be almost as good as Claude Code. I agree with the general take that Claude Code is a good default place to start.
tomrod•5h ago
> Why is this the top comment on so many threads about tech products?

Because you overestimate the difference that the representative person understands.

A more accurate analogy is that Nike sells green-blue shoes and Nike sells blue-green shoes, but the blue-green shoes add 3 feet to your jump and green-blue shoes add 20 mph to your 100 yard dash sprint.

You know you need one of them for tomorrow's hurdles race but have no idea which is meaningful for your need.

ryandrake•5h ago
Also, the green-blue shoes charge per-step, but the blue-green shoes are billed monthly by signing up for BlueGreenPro+ or BlueGreenMax+, each with a hidden step limit but BlueGreenMax+ is the one that gives you access to the Cyan step model which is better; plus the green-blue shoes are only useful when sprinting, but the blue-green shoes can be used in many different events, but only through the Nike blue-green API that only some track&field venues have adopted...
true_religion•5h ago
This is like being told to buy Nike shoes. Then when you proudly display your new cleats, they tell you "no, I meant you should by basketball shoes. The cleats are terrible."
squeaky-clean•5h ago
Which Nike shoe is best for basketball? The Nike Dunk, Air Force 1, Air Jordan, LeBron 20, LeBron XXI Prime 93, Kobe IX elite, Giannis Freak 7, GT Cut, GT Cut 3, GT Cut 3 Turbo, GT Hustle 3, or the KD18?

At least with those you can buy whatever you think is coolest. Which Claude model and interface should the average programmer use?

AlecSchueler•4h ago
What's the average programmer? Is it someone who likes CLI tools? Or who likes IDE integration? Different strokes for different folks and surely the average programmer understands what environment they will be most comfortable in.
nawgz•3h ago
> Different strokes for different folks and surely the average programmer understands what environment they will be most comfortable in.

That's a silly claim to me, we're talking about a completely new environment where you prompt an AI to develop code, and therefore an "average programmer" is unlikely to have any meaningful experience or intuition with this flow. That is exactly what GP is talking about - where does he plug in the AI? What tradeoffs are there to different options?

The other day I had someone judge me for asking this question by dismissively saying "dont say youve still been using ChatGPT and copy/paste", which made me laugh - I don't use AI at all, so who was he looking down on?

kelnos•4h ago
Because the offerings are not simple. Your Nike example is silly; everyone knows what to do with shoes and shorts and shirts, and why they might want (or not want) to buy those particular items from Nike.

But for someone who hasn't been immersed in the "LLM scene", it's hard to understand why you might want to use one particular model of another. It's hard to understand why you might want to do per-request API pricing vs. a bucketed usage plan. This is a new technology, and the landscape is changing weekly.

I think maybe it might be nice if folks around here were a bit more charitable and empathetic about this stuff. There's no reason to get all gatekeep-y about this kind of knowledge, and complaining about these questions just sounds condescending and doesn't do anyone any good.

pdntspa•4h ago
Because few seem to want to expend the effort to dive in and understand something. Instead they want the details spoonfed to them by marketing or something.

I absolutely loathe this timeline we're stuck in.

windsignaling•4h ago
On the contrary, I'm confused about why you're confused.

This is a well-known and documented phenomenon - the paradox of choice.

I've been working in machine learning and AI for nearly 20 years and the number of options out there is overwhelming.

I've found many of the tools out there do some things I want, but not others, so even finding the model or platform that does exactly what I want or does it the best is a time-consuming process.

joshmarlow•6h ago
VSCode has a pretty good Gemini integration - it can pull up a chat window from the side. I like to discuss design changes and small refactorings ("I added this new rpc call in my protobuf file, can you go ahead and stub out the parts of code I need to get this working in these 5 different places?") and it usually does a pretty darn good job of looking at surrounding idioms in each place and doing what I want. But gemini can be kind of slow here.

But I would recommend just starting using Claude in the browser, talk through an idea for a project you have and ask it to build it for you. Go ahead and have a brain storming session before you actually ask it to code - it'll help make sure the model has all of the context. Don't be afraid to overload it with requirements - it's generally pretty good at putting together a coherent plan. If the project is small/fits in a single file - say a one page web app or a complicated data schema + sql queries - then it can usually do a pretty good job in one place. Then just copy+paste the code and run it out of the browser.

This workflow works well for exploring and understanding new topics and technologies.

Cursor is nice because it's an AI integrated IDE (smoother than the VSCode experience above) where you can select which models to use. IMO it seems better at tracking project context than Gemini+VSCode.

Hope this helps!

spaceman_2020•6h ago
Download Claude Code

Create a new directory in your terminal

Open that directory, type in "Claude" to run Claude

Press Shit + Tab to go into planning mode

Tell Claude what you want to build - recommend something simple to start with. Specify the languages, environment, frameworks you want, etc.

Claude will come up with a plan. Modify the plan or break it into smaller chunks if necessary

Once plan is approved, ask it to start coding. It will ask you for permissions and give you the finished code

It really is something when you actually watch it go.

zarzavat•6h ago
Github Copilot and Claude code are not exactly competitors.

Github Copilot is autocomplete, highly useful if you use VS Code, but if you are using e.g. Jetbrains then you have other options. Copilot comes with a bunch of other stuff that I rarely use.

Claude code is project-wide editing, from the CLI.

They complement each other well.

As far as I'm concerned the utility of the AI-focused editors has been diminished by the existence of Claude code, though not entirely made redundant.

fkyoureadthedoc•6h ago
> Github Copilot is autocomplete... comes with a bunch of other stuff that I rarely use.

That bunch of other stuff includes the chat, and more recently "Agent Mode". I find it pretty useful, and the autocomplete near useless.

qingcharles•5h ago
This isn't correct. GitHub Copilot now totally competes with Claude Code. You can have it create an entire app for you in "Agent" mode if you're feeling brave. In fact, seeing as Copilot is built directly into Visual Studio when you download it, I guess they have a one-up.

Copilot isn't locked to a specific LLM, though. You can select the model from a panel, but I don't think you can plug in your own right now, and the ones you can select might not be SOTA because of that.

alienbaby•2h ago
Sonnet 4 in copilot agent mode has been doing great work for me lately. Especially once you realise that at least 50% of the work is done before you get to copilot, as architectural and product specs and implementations plans.
tomwojcik•3h ago
Opencode https://github.com/sst/opencode provides a CC like interface for copilot. It's a slightly worse tool, but since copilot with Claude 4 is super cheap, I ended up preferring it over CC. Almost no limits, cheaper, you can use all the Copilot models, GH is not training on your data.
andsoitis•5h ago
> use Claude. But I have no idea what the right way to do it is because there are so many paths to choose.

Anthropic has this useful quick start guide: https://docs.anthropic.com/en/docs/claude-code/quickstart

StephenHerlihyy•5h ago
Kilo Code for VSCode is pretty solid. Give it a try.
wintermutestwin•5h ago
Yes. You basically need an LLM to provide guidance on product selection in this brave new world.

It is actually one of my most useful use cases of this tech. Nice to have a way to ask in private so you don’t get snarky answers like: it’s just like buying shoes!

vanillax•5h ago
All the tools, copilot,claude, gemini in vscode are all completely worthless unless in Agent Mode. I have no idea why none of these tools dont default to Agent mode.
ActorNightly•5h ago
If you want your own cheap IDE integration, you can set up VSCode with Continue extension, ollama running locally, and a small agent model. https://docs.continue.dev/features/agent/model-setup.

If you want to understand how all of this works, the best way is to build a coding agent manually. Its not that hard

1. Start with Ollama running locally and Gemma3 QAT models. https://ollama.com/library/gemma3

2. Write a wrapper around Ollama using your favorite language. The idea is that you want to be able to intercept responses coming back from the model.

3. Create a system prompt that tells the model things like "if the user is asking you to create a file, reply in this format:...". Generally to start, you can specify instructions for read file, write file, and execute file

4. In your wrapper, when you send the input chat prompt, and get the model response back, you look for those formats, and make the wrapper actually execute the action. For example if the model replies back with the format to read file, you read the file from your wrapper code and send it back to the model.

Every coding assistant is basically this under the hood with just a lot more fluff and their own IDE integration.

The benefit of doing your own is that you can customize it to your own needs, and when you direct a model with more precision even the small models perform very well with much faster speed.

afro88•5h ago
OP is asking for where to get started with Claude for coding. They're confused. They just want to mess around with it in VSCode. And you start talking about Ollama, PAT, coding your own wrapper, composing a system prompt etc.!?
jimbo808•4h ago
You just described all of your options in detail - what's the problem? Pick one. Seems like you've got a very thorough grasp on how to get started trying the stuff out, but it requires you to choose how you want to do that.
kelnos•4h ago
If you're looking for a coding assistant, get Claude Code, and give it a try. I think you need the Pro plan at a minimum for that ($20/mo; I don't think Free includes Claude Code). Don't do the per-request API pricing as it can get expensive even while just playing around.

Agree that the offering is a bit confusing and it's hard to know where to start.

Just FYI: Claude Code is a terminal-based app. You run it in the working directory of your project, and use your regular editor that you're used to, but of course that means there's no editor integration (unlike something like Cursor). I personally like it that way, but YMMV.

robluxus•4h ago
> I just want to putz around with something in VSCode for a few hours!

I just googled "using claude from vscode" and the first page had a link that brought me to anthropic's step by step guide on how to set this up exactly.

Why care about pricing and product names and UI until it's a problem?

> Someone on HN told me Copilot sucks, use Claude.

I concur, but I'm also just a dude saying some stuff on HN :)

zaphirplane•4h ago
try asking it ?
screye•3h ago
Cursor + Claude 4 = best quality + UX balance. Pay up for 20/month subscription.

Cursor imports in your VSCode setup. Setting it up should be trivial.

Use Agent mode. Use it in a preexisting repo.

You're off the races.

There is a lot more you can do, but you should start seeing value at this point.

w0m•3h ago
honestly - copilot free mode; and just play with the agentic stuff can give you a good idea. Attach it to Roo and you'll get a good idea. Realize that if you paid to use a better model; you'd get better results as free doesn't have a ton of premium tokens.
ramesh31•7h ago
Will the price for 4 go down? I still find Opus completely unusable for the cost/performance, as someone who spends thousands per month on tokens. There's really no noticeable difference from Sonnet, at nearly 10x the price.
_vaporwave_•7h ago
It's interesting that Anthropic maintains current prices for prior state of the art models when doing a new release. Why offer a model with worse performance for the same price? What incentives are they trying to create?
dysoco•7h ago
I'm guessing it's mostly for legacy reasons. When 3.7 came out many people were not happy with it and went back to 3.5; I guess supporting older models for a while makes sense.
gwd•4h ago
> What incentives are they trying to create?

One obvious explanation is that pricing is strongly related to the price to them, and that their only incentive is for people to use an expensive model of they really need it.

I forget which one of the GPT models was better, faster, and cheaper than the previous model. The incentive there is obviously, "If you want to use the old model for whatever reason, fine, but we really want you to use the new one because costs us less to run."

mrcwinn•6h ago
o3 and o3-pro are just so good. Sonnet goes off the deep end too often and Opus, in my experience, is not as strong at reasoning compared to OpenAI, despite the higher costs. Rarely do we see a worse, more expensive product win - but competition is good and I’m rooting for Anthropic nonetheless!
AlecSchueler•6h ago
Off the deep end?
UncleEntity•20m ago
Probably referring to it's tendency to over-complicate things to the point you have to step in and be like "WTF are you even talking about... Wouldn't it be a lot simpler to just use the original, well planned out design?"

Which it does a lot...

WXLCKNO•5h ago
o3 feels pretty good to me as well but o3-pro has consistently one shotted problems other LLMs got stuck on.

I'm talking multiple tries of claude 4 opus, Gemini 2.5 pro, o3 etc resulting in sometimes hundreds of lines of code.

Versus o3-pro (very slowly) analyzing and then fixing something that seemed completely unrelated in a one or two line change and truly fixing the root cause.

o3-pro level LLMs at reduced cost and increased speed will already be amazing..

bayesianbot•12m ago
OpenAI also has Flex processing[1] for o3. I've spent most of my time with Gemini 2.5, but lately been trying out a ton of o3 as it seems to work quite well and I get really cheap tokens (~95% of my agentic tokens are cached which is 75% discount and flex mode adds 50% for $0.25 / million input tokens)

[1] https://platform.openai.com/docs/guides/flex-processing?api-...

thoop•6h ago
The article says "We plan to release substantially larger improvements to our models in the coming weeks."

Sonnet 4 has definitely been the best model for our product's use case, but I'd be interested in trying Haiku 4 (or 4.1?) just due to the cost savings.

I'm surprised Anthropic hasn't mentioned anything about Haiku 4 yet since they released the other models.

mocmoc•6h ago
Their limits are just … a real road blocker
bananapub•5h ago
huh?

Claude Mad is tens of hours of opus a month, or you can pay per token and have unlimited.

Or did you mean “I wish it was cheaper”?

andyferris•1h ago
Ha - the $200 plan should be renamed to "Claude Mad Max" :)
OldGreenYodaGPT•6h ago
Claude Code has honestly made me at least 10x more productive. I’ve burned through about 3 billion tokens and have been consistently merging 5+ PRs a day, tackling tons of tech debt, improving GitHub Actions, and making crazy progress on product work
totaa•6h ago
can you share your workflow?
steinvakt2•6h ago
I also have this feeling that I'm 2-10x more productive. But isn't it curious how a lot of devs feel this way, but no devs that I know have the experience that any of their colleagues have become 2-10x more productive?
nevertoolate•5h ago
10x means to me that i can finish a month of work in max 2 days and go cloud watching. What does it mean for you?
mwigdahl•5h ago
<raises hand> Our automated test folks were chronically behind, struggling to keep up with feature development. I got the two assigned to the team that was the most behind set up with Claude Code. Six weeks later they are fully caught up, expanding coverage, and integrating AI code review into our build pipeline.

It's not 10x, but those guys do seem like they've hit somewhere around 2x improvement overall.

samtp•5h ago
What type of work do you do and what type of code do you produce?

Because I've found it to work pretty amazingly for things that don't need to be exact (like data modeling) or don't have any security implications (public apps). But for everything else I end up having to find all the little bugs by reading the code line by line, which is much slower than just writing the code in the first place.

AstroBen•5h ago
only 10x? I'm at least 100x as productive. I only type at a measly 100wpm, whereas Claude can output 100+ tokens a second

I'm outputting a PR every 6 minutes. The reviewers are using Claude to review everything. It used to take a day to add 100 lines to the codebase.. now I can add 100 lines in one prompt

If I want even more productivity (at risk of making the rest of my team look slow) I can tell Claude to output double the lines and ship it off for review. My performance metrics are incredible

samtp•5h ago
So no human reads the actual code that you push to production? Are you not worried about security risks, spaghetti code, and other issues? Or does Claude magically make all of those concerns go away?
AstroBen•5h ago
forgot the /s
samtp•5h ago
Sorry lol, sometimes difficult to separate the hype boys from actual sarcasm these days
qingcharles•3h ago
Not sure if joking...?
AstroBen•3h ago
This is only the beginning. I can see myself having 100 Claude tasks running concurrently - the only problem is edits clash between files. I'm working on having Claude solve this by giving each instance its own repo to work with, then I ask the final Claude to mash it all together as best it can

What's 100x productivity multiplied by 100 instances of Claude? 10,000x productivity

Now to be fair and a bit more realistic it's not actually 10000x because it takes longer to push the PR because the file sizes are so big. Let's call it 9800x. That's still a sizable improvement

trallnag•3h ago
Big if true
screye•3h ago
How do you maintain high confidence in the code it generates ?

My current bottleneck is having to review the huge amounts of code that these models spit out. I do TDD, use auto-linting and type-checking.... but the model makes insidious changes that are only visible on deep inspection.

theappsecguy•2h ago
The only way you could be 10x more productive is omit you were doing nothing before.
P24L•6h ago
The improved Opus isn’t about achieving significantly better peak performance for me. It’s not about pushing the high end of the spectrum. Instead, it’s about consistently delivering better average results - structuring outputs more effectively, self-correcting mistakes more reliably, and becoming a trustworthy workhorse for everyday tasks.
djha-skin•5h ago
Opus 4(.1) is so expensive[1]. Even Sonnet[2] costs me $5 per hour (basically) using OpenRouter + Codename Goose[3]. The crazy thing is Sonnet 3.5 costs the same thing[4] right now. Gemini Flash is more reasonable[5], but always seems to make the wrong decisions in the end, spinning in circles. OpenAI is better, but still falls short of Claude's performance. Claude also gives back 400's from its API if you CTRL-C in the middle though, so that's annoying.

Economics is important. Best bang for the buck seems to be OpenAI ChatGPT 4.1 mini[6]. Does a decent job, doesn't flood my context window with useless tokens like Claude does, API works every time. Gets me out of bad spots. Can get confused, but I've been able to muddle through with it.

1: https://openrouter.ai/anthropic/claude-opus-4.1

2: https://openrouter.ai/anthropic/claude-sonnet-4

3: https://block.github.io/goose/

4: https://openrouter.ai/anthropic/claude-3.5-sonnet

5: https://openrouter.ai/google/gemini-2.5-flash

6: https://openrouter.ai/openai/gpt-4.1-mini

generalizations•5h ago
Get a subscription and use claude code - that's how you get actual reasonable economics out of it. I use claude code all day on the max subscription and maybe twice in the last two weeks have I actually hit usage limits.
tgtweak•5h ago
Is it considerably more cost effective than cline+sonnet api calls with caching and diff edits?

Same context length and throughput limits?

Anecdotally I find gpt4.1 (and mini) were pretty good at those agentic programming tasks but the lack of token caching made the costs blow up with long context.

bavell•4h ago
I'm on the basic $20/mo sub and only ran into token cap limitations in the first few days of using Claude Code (now 2-3 weeks in) before I started being more aggressive about clearing the context. Long contexts will eat up tokens caps quickly when you are having extended back-and-forth conversations with the model. Otherwise, it's been effectively "unlimited" for my own use.
bgirard•3h ago
YMMV I'm using the $100/mo max subscription and I hit the limit during a focused coding session where I'm giving it prompts non-stop.

Unfortunately there's no easy tool to inspect usage. I started a project to parse the Claude logs using Claude and generate a Chrome trace with it. It's promising but it was taking my tokens away from my core project.

bartman•2h ago
Check out ccusage, it sounds like the tool you’re describing: https://github.com/ryoppippi/ccusage
bgirard•2h ago
That's neat. According to the tool I'm consuming ~300m tokens per day coding with a (retail?) cost of ~125$/day. The output of the model is definitely worth $100/mo to me.
symbolicAGI•2h ago
ccusage on GitHub.
MarcelOlsz•1h ago
If you use Claude Code with a subscription and run `ccusage` [0] you can get an idea of your "true usage" and cost.

[0] https://github.com/ryoppippi/ccusage

j45•2m ago
Yes, it’s much better.

It uses way less tokens or much more effectively when running locally.

seneca•3h ago
Is there a way to sign up for Claude code that doesn't involve verifying a phone number with Anthropic? They don't even accept Google Voice numbers.

Maybe I'm out of touch, but I'm not handing out my phone number to sign up for random SaaS tools.

tagami•3h ago
use a burner
kroaton•4h ago
GLM 4.5 / Kimi K2 / Qwen Coder 3 / Gemini Pro 2.5
Aeolun•1h ago
In every price comparison I make. Claude (API) always comes out cheapest if you manage to keep most of your context cached. 90% price reduction for input is crazy.
energy123•43m ago
Large models are for querying the model

Small models are for querying the context

Opus is cheap if you use it for its niche

thimabi•2m ago
> Large models are for querying the model > Small models are for querying the context

I respectfully disagree.

My experience is that large models are capable of understanding large contexts much better. Of course they are more expensive and slower, too. But in terms of accuracy, large models are always better at querying the context.

paul7986•5h ago
Claude plus failed me today badly compared to chatGPT plus.

I uploaded a web design of mine (jpeg) and asked Claude to create the html/css. Asked GPT to do the same. GPT's code looked the closet to the design I created and uploaded. Just five to ten small tweaks and I was done vs. Claude it would have taken me almost triple the steps.

I actually subscribed to both today (resubscribed to GPT) and going to keep testing which one is the better front-end developer (i am, but got to embrace AI ).

alvis•4h ago
Funny Open AI and Anthropic seems to be coordinating their releases on the same day
KaoruAoiShiho•4h ago
For me this is the big news of the day. Looks insane.
hartator•4h ago
> 1 min read

What the point of these?

Kind of interesting that we live in an area of AI super advanced, but still make basic UI/UX mistake. The tagline of this blog post shouldn't be "1 min read".

It's not even accurate. I timed myself not reading fast but not slow, took me 3 min 30s. Maybe the images need be OCRed to make the estimation more accurate.

TimMeade•3h ago
This has been the worse Claude day ever. Just fell apart. Not sure if the release is why, but cursing in documents and can not fix a bug after hours of back and forth.
system2•9m ago
Claude lost me after I used it for a day. Their pricing model is bonkers. There is no way any developer in their right mind would go with Claude.