> One large enterprise employee commented that they were deliberately slow with AI tech, keeping about a quarter behind the leading edge. “We’re not in the business of avoiding all risks, but we do need to manage them”.
I’m unclear how this pattern helps with security vis-à-vis LLMs. It makes sense when talking about software versions, in hoping that any critical bugs are patched, but prompt injection springs eternal.
Yes, but some are mitigated when discoverd, and some more critical areas need to be isolated from the LLM so taking their time to provision LLM into their lifecycle is important, and they're happy to spend the time doing it right, rather than just throwing the latest edge tech into their system.
* Never give an agent any input that is not trusted
* Never give an agent access to anything that would cause a security problem (read only access to any sensitive data/credentials, or write access to anything dangerous to write to)
* Never give an agent access to the internet (which is full of untrusted input, as well as places that sensitive data could be exfiltrated)
An LLM is effectively an unfixable confused deputy, so the only way to deal with it is effectively to lock it down so it can't read untrusted input and then do anything dangerous.
But it is really hard to do any of the things that folks find agents useful for, without relaxing those restrictions. For instance, most people let agents install packages or look at docs online, but any of those could be places for prompt injection. Many people allow it to run git and push and interact with their Git host, which allow for dangerous operations.
My current experimentation is running my coding agent in a container that only has access to the one source directory I'm working on, as well as the public internet. Still not great as the public internet access means that there's a huge surface area for prompt injection, though for the most part it's not doing anything other than installing packages from known registries where a malicious package would be just as harmful as a prompt injection.
Anyhow, there have been various people talking about how we need more sandboxes for agents, I'm sure there will be products around that, though it's a really hard problem to balance usability with security here.
We've experimented with rolling open source models on local hardware, but it's so easy to inject things into them that it's not really going anywhere. It's going to be a massive challenge, because if we don't provide the tools, employees are going to figure out how to do it on their own.
I do like the idea that "all code is tech debt", and we shouldn't want to produce more of it than we need. But it's also worth remembering that debt is not bad per se, buying a house with a mortgage is also debt and can be a good choice for many reasons.
I suggest something like "Tidbits from the Thoughtworks Future of Software Development Retreat" (from the first sentence, captures the content reasonably well.)
The text is actually about the Thoughtworks Future of Software Development retreat.
This is one of the most interesting questions right now I think.
I've been taking on much more significant challenges in areas like frontend development and ops and automation and even UI design now that LLMs mean I can be much more of a generalist.
Assuming this works out for more people, what does this mean for the shape of our profession?
If you want to get/stay good at debugging--again IMO--it's more important to be involved in operations, where shit goes wrong in the real world because you're dealing with real invalid data that causes problems like poison pill messages stuck in a message queue, real hardware failures causing services to crash, real network problems like latency and timeouts that cause services which work in the happy path to crumble under pressure. Not only does this instil a more methodical mentality in you, it also makes you a better developer because you think about more classes of potential problems and how to handle them.
In the past 6 months, all my code has been written by claude code and gemini cli. I have written code backend, frontend, infrastructure and iOS. Considering my career trajectory all of this was impossible a couple of years ago.
But the technical debt has been enormous. And I'll be honest, my understanding of these technologies hasn't been 'expert' level. I'm 100% sure any experienced dev could go through my code and may think it's a load of crap requiring serious re-architecture.
It works (that's great!) but the 'software engineering' side of things is still subpar.
Claude Code is producing working useful GUIs for me using Qt via pyside6. They work well but I have no doubt that a dev with real experience with Qt would shudder. Nonetheless, because it does work, I am content to accept that this code isn't meant to be maintained by people so I don't really care if it's ugly.
We’ve been trying to build well engineered, robust, scalable systems because software had to be written to serve other users.
But LLMs change that. I have a bunch of vibe coded command lines tools that exactly solve my problems, but very likely would make terrible software. The thing is, this program only needs to run on my machine the way I like to use it.
In a growing class of cases bespoke tools are superior to generalized software. This historically was not the case because it took too much time and energy to maintain these things. But today if my vibe coded solution breaks, I can rebuild it almost instantly (because I understand the architecture). It takes less time today to build a bespoke tool that solved your problem than it does to learn how to use existing software.
There’s still plenty of software that cannot be replaced with bespoke tools, but that list is shrinking.
FOSS meant that the cost of building on reusable components was nearly zero. Large public clouds meant the cost of running code was negligible. And now the model providers (Anthropic, Google, OpenAI) means that the cost of producing the code is relatively small. When the marginal cost of producing code approaches zero, we start optimizing for all the things around it. Code is now like steel. It's somewhat valuable by itself, but we don't need the town blacksmith to make us things anymore.
What is still valuable is the intuition to know what to build, and when to build it. That's the je ne sais quoi still left in our profession.
“Ideas that surfaced: code as ‘just another projection’ of intended behaviour. Tests as an alternative projection. Domain models as the thing that endures. One group posed the provocative question: what would have to be true for us to ‘check English into the repository’ instead of code?
The implications are significant. If code is disposable and regenerable, then what we review, what we version-control, and what we protect all need rethinking.
”Second there’s a world of difference still between a developer with taste using AI with care and the slop cannons out there churning out garbage for others to suffer through. I’m betting there is value in the former in the long run.
We do have some idea. Kimi K2 is a relatively high performing open source model. People have it running at 24 tokens/second on a pair of Mac Studios, which costs 20k. This setup requires less than a KW of power, so the $0.8-0.15 being spent there is negligible compared to a developer. This might be the cheapest setup to run locally, but it's almost certain that the cost per token is far cheaper with specialized hardware at scale.
In other words, a near-frontier model is running at a cost that a (somewhat wealthy) hobbyist can afford. And it's hard to imagine that the hardware costs don't come down quite a bit. I don't doubt that tokens are heavily subsidized but I think this might be overblown [1].
[1] training models is still extraordinarily expensive and that is certainly being subsidized, but you can amortize that cost over a lot of inference, especially once we reach a plateau for ideas and stop running training runs as frequently.
Is Kimi K2 near-frontier though? At least when run in an agent harness, and for general coding questions, it seems pretty far from it. I know what the benchmarks say, they always say it's great and close to frontier models, but is this other's impression in practice? Maybe my prompting style works best with GPT-type models, but I'm just not seeing that for the type of engineering work I do, which is fairly typical stuff.
I’ve been pretty active in the open model space and 2 years ago you would have had to pay 20k to run models that were nowhere near as powerful. It wouldn’t surprise me if in two more years we continue to see more powerful open models on even cheaper hardware.
$20,000 is a lot to drop on a hobby. We're probably talking less than 10%, maybe less than 5% of all hobbyists could afford that.
this is marketing not reality.
Get a few lines of code and it becomes unusable.
Now, these models are a bit weaker, but they're in the realm of Claude Sonnet to Claude Opus 4. 6-12 months behind SOTA on something that's well within a personal hobby budget.
Token costs are also non-trivial. Claude can exhaust a $20/month session limit with one difficult problem (didn't even write code, just planned). Each engineer needs at least the $200/mo plan - I have multiple plans from multiple providers.
Chinese open source models are dirt cheap, you can buy $20 worth of kimi-k2.5 on opencode and spam it all week and barely make a dent.
Assuming we never got bigger models, but hardware keeps improving, we'll either be serviing current models for pennies, or at insane speeds, or both.
The only actual situation where tokens are being subsidized is free tiers on chat apps, which are largely irrelevant for any sort of useful economic activity.
I think this is often a mental excuse for continuing to avoid engaging with this tech, in the hope that it will all go away.
There's a difference between running inference and running a frontier model company.
https://www.theinformation.com/articles/anthropic-lowers-pro...
Local or self hosted LLMs will ultimately be the future. Start learning how to build up your own AI stack and use it day to day. Hopefully hardware catches up so eventually running LLMs on device is the norm.
This isn't a case where you have specific code/capital you have borrowed and need to pay for its use or give it back. This is flat out putting liabilities into your assets that will have to be discovered and dealt, someday.
fuzzfactor•1h ago
[0] Which is not even enough, these are the ones with truly excess money to burn.
bilekas•54m ago
fuzzfactor•22m ago
Are you assuming tech debt has no financial cost?