I had tried coding with ChatGPT a year or so ago and the effort needed to get anything useful out of it greatly exceeded any benifit, so I went into CC with low expectations, but have been blown away.
Let me illustrate with a specific, simple example: fixing linter or compiler errors. The problems I solve with this method are all verifiable via the command line (this can usually be documented in CLAUDE.md). Claude Code will continuously adjust the code based on the linter's output until all errors are resolved. This process often takes quite some time. I typically do this after completing a feature development. If Claude Code mistakenly thinks it has finished the task during one of these checks, it will halt the entire process. I then have to restart it using the same prompt to continue the task.
Therefore, I'm looking for an external tool to manage Claude Code. I haven't found one yet. I've seen some articles suggesting the use of a subagents approach, where tools like Gemini CLI or Codex could launch Claude Code. I haven't thoroughly explored this method yet.
Doesn’t matter if you tell it multiple times in CLAUDE.md to not skip checks, it will eventually just skip them so it can commit. It’s infuriating.
I hope that as CC evolves there is a better way to tell/force the model to do things like that (linters, formatters, unit/e2e tests, etc).
Students don't get to choose whether to take the test, so why do we give AI the choice?
I have a `task build` command that runs linters, tests and builds the project. All the commands have verbosity tuned down to minimum to not waste context on useless crap.
Claude remembers to do it pretty well. I have it in my global CLAUDE.md sot I guess it has more weight? Dunno.
Example: read this log file and extract XYZ from it and show me a table of the results. Instead of having the agent read in the whole log file into the context and try to process it with raw LLM attention, you can get it to read in a sample and then write a script to process the whole thing. This works particularly well when you want to do something with math, like compute a mean or a median. LLMs are bad at doing math on their own, and good at writing scripts to do math for them.
A lot of interesting techniques become possible when you have an agent that can write quick scripts or CLI tools for you, on the fly, and run them as well.
When you tell an LLM to check the code for errors, the LLM could simply "realize" that the problem is complex enough to warrant building [or finding+configuring] an appropriate tool to solve the problem, and so start doing that... but instead, even for the hardest problems, the LLM will try to brute-force a solution just by "staring at the code really hard."
(To quote a certain cartoon squirrel, "that trick never works!" And to paraphrase the LLM's predictable response, "this time for sure!")
That is for tasks where a programmatic script solution is a good idea though. I don't think your example of "check the code for errors" really falls in that category - how would you write a script to do that? "Staring at the code really hard" to catch errors that could never have been caught with any static analysis tool is actually where an LLM really shines! Unless by "check for errors" you just meant "run a static analysis tool", in which case sure, it should run the linter or typechecker or whatever.
After all, solving an immediate problem that seems like it could come up again, by “taking the opportunity” to solve the problem from now on by introducing workflow automation to solve the problem, is what an experienced human engineer would likely do in such a situation (if they aren’t pressed for time.)
I used claude to translate my application and I asked him to translate each text in the application to his best abilities.
That worked great for one view, but when I asked him to translate the rest of the application in the same fashion he got lazy and started to write a script to substitute some words instead of actually translating sentences.
Hmm. My experience of "the average programmer" doesn't look like yours and looks more like the LLM :/
I'm constantly flabbergasted as to how way too many devs fumble through digging into logs or extracting information or what have you because it simply doesn't occur to them that tools can be composed together.
From my experience, only a few rare devs do this. Most will stick with (broken/wrong) GUI tools they know made by others, by convenience.
For you existing browser session you'd have to start it already with open socket connection as by default that's not enabled but once you do the server should able to find an open local socket and connect to it and execute controls.
worth nothing that this "control browser" hype is quite deceiving and it doesn't really work well imo because LLMs still suck at understanding the DOM so you need various tricks to optimize for that so I would take OP's claims with a giant bag of salt.
Also these automations are really easy to identify and block as they are not organic inputs so the actual use is very limited.
https://github.com/day50-dev/Mansnip
wrapping this in an STDIO mcp is probably a smart move.
I should just api-ify the code and include the server in the pip. How hard could this possibly be...
That's an interesting viewpoint from an AI marketing company.
I think the essential job of marketing is to help people make the connection between their problems and your solutions. Putting all on them in a kind of blamey way doesn't seem like a great approach to me.
That response suggests you aren't interested in discussion or conversation at all.
It suggests that your purpose here is to advertise.
That's fair but it's what I believe.
...see?
Being consistent with stating your beliefs isn't the same as engaging with and about those beliefs.
Advertising isn't conversation. Evangelism isn't discussion.
I agree that's the job of marketing, but I'm not someone who markets AI, I'm someone who helps large marketing organizations use it effectively. I agree that if my goal was to market it that wouldn't be an effective message, but my goal is for folks who work in these companies to take some accountability for their own personal development, so that's my message. Again, all I can do is be honest about how I feel and to be consistent in my beliefs and experiences working with these kinds of organizations.
Online discussion with randos about this topic is almost useless because everybody is quick to dismiss the other side as hopelessly brainwashed by hype, or burying their heads in the sand for fear of the future of their jobs. I've had much better luck talking about it with people I've known and had mutual respect with before all this stuff came out.
(And then the CISO sends some security tips email/slack announcement which is still dumb and useless even after an LLM added a bunch of emojis and fun language to it.)
I've always been an old-fashioned and slow developer. But it still seems to me, if most "regular" "average" developers churn out code that is more of a liability than an asset, if they can do that 10x faster, it doesn't really change the world. Most stuff still ends up waiting, in the end, for some slow work done right, or else gets thrown away soon enough.
I'm personally in the habit of answering even slightly complex questions by first establishing shared context - that is, I very carefully ensure that my conversational partner has exactly the same understanding of the situation that I do. I do this because it's frequently the case that we don't have a lot of overlap in our understanding, or we have very specific gaps or contradictions in our understanding.
If you're like many in this industry, you're working in a language other than what you were raised in, making all of this more difficult.
I think it's the pengiun approach to risk management -- they know they need to jump in the water to get where they need to go, but they don't know where the orcas are. So they jostle closer and closer to the edge, some fall in, and the rest see what happens.
BTW, I probably shouldn't have only commenting on the small part at the end that annoyed me. I'm fascinated by the idea that LLMs make highly custom software feasible, like your "claudsidian" system... that people will be able to get the software they want by describing it rather than being limited to finding something preexisting and having to adapting to it. As you point out, the unix philosophy is one way -- simple, unopinionated, building blocks an LLM can compose out of user-level prompts.
Great way to describe the culture of fear prevalent at large companies.
Also about a tool being overly conplex. Something like find, imagemagick, ffmpeg,… are not complex in themselves. They’re solving a domain that itself is complex. But the tools are quite good the evidence is their stability where they’ve barely changed across decades.
ffmpeg does all things media conversion. If you don’t want to learn how to use it, you find someone that does (or do the LLM gamble) or try to find a wrapper that have a simpler interface and hope the limited feature set encompasses your use cases.
A cli tool can be extremely versatile. GUI is full of accidental complexities, so unless your selling point is intuitiveness, it’s just extra work.
Basically I have it sitting over the top of my notes and assisting with writing, editing, researching, etc.
I love obsidian for the same basic reason you do: it’s just a bunch of text files, so I can use terminal tools and write programs to do stuff with them.
So far I mostly use LLMs to write the tools themselves, but not actually edit the notes. Maybe I can steal some of your ideas!
FOMO is for fashions and fads, not getting things done.
I probably wouldn’t do it myself either, but that’s not really relevant to whether it works or not.
Filling food with opioids would be great for business, but hopefully you understand how that is not "good business"
I do not care that it is common. I want it to be not common.
I do not care that bad marketing tactics like this can be used to sell "good" products, whatever that means.
You're supposed to start with a use case that is unmet, and research/build technology to enable and solve the use case.
AI companies are instead starting with a specific technology, and then desperately searching for use cases that might somehow motivate people to use that technology. Now these guys are further arguing that it should be the user's problem to find use cases for the technology they seem utterly convinced needs to be developed.
(Well, I recently found there is a reason for it: I'm left handed and unlocking my phone with my left hand sometimes touch the icon stupidly put by default on the lock screen. Not that it would work: My phone is usually running with data disabled.)
It started as unshare and ended up being a bit of a yakshaving endeavor to make things work but i was able to get some surprisingly good results using gemma3 locally and giving it access to run arbitrary debian based utilities.
I'm curious to see what you've come up with. My local LLM experience has been... sub-par in most cases.
I’ve had much better luck with constrained, structure tools that give me control over exactly how the tools behave and what context is visible to the LLM.
It seems to be all about making doing the correct thing easy, the hard things possible, and the wrong things very difficult.
Not around privacy, mind you. If your notes contain nothing that you wouldn’t mind being subpoenaed or read warrantlessly by the DHS/FBI, then you are wasting your one and only life.
exact opposite of the unix philosophy
Well, no, they aren't, but the orchestration frameworks in which they are embedded sometimes are (though a lot of times a whole lot of that everything is actually done by separate binaries the framework is made aware of via some configuration or discovery mechanism.)
the article is framing LLM's as a kind of fuzzy pipe that can automatically connect lots of tools really well. This ability works particularly well with unix-philosophy do-one-thing tools, and so being able to access such tools opens a superpower that is unique and secretly shiny about claudecode that browser-based chatgpt doesn't have.
This feels a bit like rediscovering stateless programming. Obviously the filesystem contents can actually change, but the idea of an idempotent result when running the same AI with the same command(s) and getting the same result would be lovely. Even better if the answer is right.
What even is this? Is it all AI slop? All of these articles are borderline nonsensical, in that weird dreamy tone that all AI slop has.
To see this waxing poetic about the Unix philosophy, which couldn't be farther from the modern "AI" workflow, is... something I can't quite articulate, but let's go with "all shades of wrong". Seeing it on the front page of HN is depressing.
Now, due to tools like claude code, CLI is actually clearly the superior interface.
(At least for now)
It's not supposed to be an us vs them flamewar, of course. But it's fun to see a reversal like this from time to time!
The CLI has been dead for end-users since computers became powerful enough for GUIs, but the CLI has always been there behind the scenes. The closest we have been to the "CLI is dead" mentality was maybe in the late 90s, with pre-OSX MacOS and Windows, but then OSX gave us a proper Unix shell, Windows gave us PowerShell, and Linux and its shell came to dominate the server market.
Obviously not around during the 90's when the GUI was blowing up thanks to Windows displacing costly commercial Unix machines (Sun, SGI, HP, etc.) By 2000 people were saying Unix was dead and the GUI was the superior interface to a computer. Visual Basic was magic to a lot of people and so many programs were GUI things even if they didn't need to be. Then the web happened and the tables turned.
Microsoft drank early OOP koolaid and thus powershell suffered from problems that were well covered by the time etc…
Ray Norda being pushed out after WordPerfect bought Novell with their own money and leveraged local religious politics in addition to typical corporate politics killed it.
Intel convinced major UNIX companies to drop their CPUs for IA-64 which was never delivered, mainly because the core decision was incompatible with the fundamental limitations of computation etc…
The rise of Linux, VMs and ultimately the cloud all depended on the CLI.
Add in Microsoft anticompetitive behavior plus everything else and you ended up with a dominant GUI os provider with a CLI that most developers found challenging to use.
I worked at some of the larger companies with large windows server installations and everyone of them installed Cygwin to gain access to tools that allowed for maintainable configuration management tools.
There are situations like WordPerfect which had GUI offerings be delayed due to the same problem that still plague big projects today, but by the time the web appeared Microsoft had used both brilliant and extremely unethical practices to gain market dominance.
The rise of technology that helped with graphics like vesa local bus and GPUs in the PC space that finally killed the remaining workstation vendors was actually concurrent with the rise of the web.
Even with that major companies like SGI mainly failed because they dedicated so many resources to low end offerings that they lost their competitiveness on the high end, especially as they fell into Intels trap with Itanium too.
But even that is complicated way beyond what I mentioned above.
Meanwhile John Carmack was using an IDE the whole time - Maybe he was just in a different realm.
I tend to agree with the trend of the parents comment. The CLI came along with the horde, like the english language or javascript.
BSD/Mach gave us that, OSX just included it in their operating system.
Maybe in some circles.
You don't remember the period where Linux was considered a joke compared to NT or "real" unices? Maybe I was just around a lot of elitists.
With CLI and TUI tools it's keyboard first and the mouse might work if it wasn't too much of a hassle for the dev.
And another issue with GUI tooling is the lack of composability. With a CLI I can input files to one program grab the output and give it to another and another with ease.
With GUI tools I need to have three of them open at the same time and manually open each one. Or find a single tool that does all three things properly.
- Has anyone found claude code been able to documentation for parts of the code which does not:
(a). Explode in maintenance time exponentially to help claude understand and iterate without falling over/hallucinating/design poorly?
(b). Use it to make code reviewers life easy? If so how?
I think the key issue for me is the time the human takes to *verify*/*maintain* plans is not much less than what it might take them to come up with a plan that is detailed enough that many AI models could easily implement.
Especially on bootstrap/setup, AIs are fantastic for cutting out massive amounts of time, which is a huge boon for our profession. But core logic? I think that's where the not-really-saving-time studies are coming from.
I'm surprised there aren't faux academic B-school productivity studies coming out to counter that (sponsored by AI funding of course) already, but then again I don't read B-school journals.
I actually wonder if the halflife decay of the critical mass of vibecode will almost perfectly coincide with the crash/vroosh of labor leaving the profession to clean it up. It might be a mini-y2k event, without such a dramatic single day.
But anything you can do on the CLI, so can an agent. It’s the same thing as chefs preferring to work with sharp knives.
Yet highly preferred over CLI applications to the common end user.
CLI-only would have stunted the growth of computing.
Really, GUIs can be formed of a public API with graphics slapped on top. They usually aren't, but they can be.
- significantly less obsequious (very few "you're absolutely right" that Claude vomits out on every interaction)
- less likely to forget and ignore context and AGENTS.md instructions
- fewer random changes claiming "now everything is fixed" in the first 30-50% of context
- better understanding of usage rules (see link below), one-shotting quite a few things Claude misses
Language + framework: Elixir, Phoenix LiveView, Ash, Oban, Reactor
SLoC: 22k lines
AGENTS.md: some simple instructions, pointing to two MCPs (Tidewave and Stripe), requirement to compile before moving onto next file, usage rules https://hexdocs.pm/usage_rules/readme.html
Before the GPT-5 release it was a poor imitation IMO - in the macOS terminal it somehow even disabled copy and paste!
Codex today is almost unrecognizable in comparison to that version. It's really good. I use both it and Claude Code almost interchangeably at the moment and I'm not really feeling that one is notably better than the other.
Caveat, requires a linux environment, OSX, or WSL.
In general, I find that it will write smarter code, perform smarter refactors, and introduce less chaos into my codebase.
I'm not talking about toy codebases. I use agents on large codebases with dozens of interconnected tools and projects. Claude can be a bit of a nightmare there because it's quite myopic. People rave about it, but I think that's because they're effectively vibe-coding vastly smaller, tight-scoped things like tools and small websites.
On a larger project, you need a model to take the care to see what existing patterns you're using in your code, whether something's already been done, etc. Claude tends to be very fast but generate redundant code or comical code (let's try this function 7 different ways so that one of those ways will pass). This is junior coder bullshit. GPT-5-Codex isn't perfect but there's far far less of that. It takes maybe 5x longer but generates something that I have more confidence in.
I also see Codex using tools more in smart ways. If it's refactoring, it'll often use tools to copy code rather than re-writing the code. Re-writing code is how so many bugs have been introduced by LLMs.
I've not played with Sonnet 4.5 yet so it may have improved things!
Then you check the result and see what happened. It's pretty good at one-shotting things if it gets the gist, but if it goes off the rails you can't go back three steps and redirect.
On the other hand Claude Code is more like pair programming, it's chatting about while doing things, telling you what it's doing and why "out loud". It's easier to interrupt it when you see it going off track, it'll just stop and ask for more instructions (unlike Copilot where if you don't want it to rm the database.file you need to be really fast and skip the operation AND hit the stop button below the chatbox).
I use both regularly, GPT is when I know what to do and have it typed out. Claude is for experimenting and dialogue like "what would be a good idea here?" type of stuff.
2. Those programs integrate with one another to achieve more complex tasks.
3. Text streams are the universal interface and state is represented as text files on disk.
Sounds like the UNIX philosophy is a great match for LLMs that use text streams as their interface. It's just so normalized that we don't even "see" it anymore. The fact that all your tools work on files, are trivially callable by other programs with a single text-based interface of exec(), and output text makes them usable and consumable by an LLM with nothing else needed. This didn't have to be how we built software.
The fact that the AI interpreter will use small commands makes it very useful.
(As far as I’m aware our brains are opposite of UNIX, starting right from the fact they had evolved and were not designed at all. And the article is about Claude and not me.)
An LLM can do effectively anything that a human can do by typing commands into a shell now.
With LLM Agents you can :D
Information theoretic efficiency seems to be a theme of UNIX architecture: https://benoitessiambre.com/integration.html.
A trick I use often with this pattern is (for example): 'you can run shell commands. Use tmux to find my session named "bingo", and view the pane in there. you can also use tmux to send that pane keystrokes. when you run shell commands, please run them in that tmux pane so i can watch. Right now that pane is logged into my cisco router..."
It works wonders though, like spelunking your raw thoughts.
Obsidian has a plugin system that can be easily customized. You can run your own JS scripts from a local folder. Claude Code is excellent at creating and modifying them on the fly.
For example, I built a custom program that syncs Obsidian files with a publish flag to my Github repo, which triggers a netlify build. My website updates when I update my vault and run a sync.
And I can't help but think: what would a cutting edge "CLI ninja" LLM like Claude be able to do if given access to a diagnostic interface that exposes all the logs and sensor readings, a list of known common issues and faults, and a full technical reference manual?
Cars have the somewhat standardized OBD ports that you could pry the necessary data out from, but industrial robots or vending machines or smartphones? They sure don't.
But what inspires this line of inquiry is exactly the kind of success I had just feeding random error logs to AI and letting it sift through them for clues. It doesn't always work, but it works just often enough to make me wonder about the broader use cases.
xorvoid•4mo ago
smm11•4mo ago
p_ing•4mo ago
eadmund•4mo ago
If only I were retired and had infinite time!
gchamonlive•4mo ago
The average user like me wouldn't be able to run pipelines without serious infrastructure, but it's very important to understand how the data is used and how the models are trained, so that we own the model and can assess its biases openly.
tsimionescu•4mo ago
gchamonlive•4mo ago
scottyah•4mo ago
I'm not sure how everyone can have access to the data without necessitating another taking on the burden of providing it.
gchamonlive•4mo ago
I'm also not saying anyone should be forced to disclose training data. I'm only staying that a LLM that's only openweight and not open data/pipeline barely fits the opensource model of the stack mentioned by OP.
tsimionescu•4mo ago
Now, running local models instead of using them as a SaaS has a clear purpose: the price of your local model won't suddenly increase ten fold once you start depending on it, like the SaaS models might. Any level of control beyond that is illusory with LLMs.
gchamonlive•4mo ago
It's fine for models to have open-weights and closed data. It's only barely fitting the opensource model IMHO though.
tsimionescu•4mo ago
An open weight model addresses the second part of THIS, but not the first. However, even an open weight model with all of the training data available doesn't fix the first problem. Even if you somehow got access to enough hardware to train your own GPT-5 based on the published data, you still couldn't meaningfully fix an issue you have with it, not even if you hired Ilya Sutskever and Yann LeCun to do it for you: these are black boxes that no one can actually understand at the level of a program or device.
guy_5676•4mo ago
I have also seen people train "jailbreaks" of popular open source LLMs (e.g. Google Gemma) that remove the condescending ethical guidelines and just let you talk to the thing normally.
So all in all I am skeptical of the claim that there would be no value in having access to the training data. Clearly there is some ability to steer the direction of the output these models produce.
fragmede•4mo ago
https://www.anthropic.com/news/golden-gate-claude
https://huggingface.co/mlabonne/Meta-Llama-3.1-8B-Instruct-a...
visarga•4mo ago
They probably can't give you the training set as it would amount to publication of infringing content. Where would you store it, and what would you do with it anyway?
gchamonlive•4mo ago
noahbrier•4mo ago
buzzy_hacker•4mo ago
I have many programs I use that I wish were a little different, but even if they were open source, it would take a while to acquaint myself with the source code organization to make these changes. LLMs, on the other hand, are pretty good at small self-contained changes like tweaks or new minor features.
This makes it easier to modify open source programs, but also means that if a program isn't open source, I can't make these changes at all. Before, I wasn't going to make the change anyway, but now that I actually can, the ability to make changes (i.e. the program is open source) becomes much more important.
blks•4mo ago
nwsm•4mo ago
CountGeek•4mo ago
BirAdam•4mo ago