A 12-day unsupervised "experiment" in production?
> It deleted our production database without permission," Lemkin wrote on X on Friday.
"Without permission"? You mean permission given verbally? If an AI agent has access, it has permission. Permissions aren't something you establish with an LLM by conversing with it.
> Jason Lemkin, an investor in software startups
It tells us something about the hype machine when investors in AI are clueless (or even plainly delusional; see Geoff Lewis) about how LLMs work.
Sure. But in this case the AI boosterism that runs rampant in the investor class is rooted in that cluelessness.
Lots of investors also quietly know little about the workings of the products and services their investments are tied up with, and that's fine. But it's also uninteresting.
Gonna replace the "an AI agent" bit of this with "someone / something" and put it in my note of favorite quotes, such a great line.
It was a 12 day experiment to see what he could learn about vibe coding. He started from scratch.
Your post is unreasonably presumptive and cynical. Jason Lemkin was Tweeting the experiment from the start. He readily admitted his own limitations as a non-programmer. He was partially addressing the out of control hype for vibe coding by demonstrating that non-technical people cannot actually vibe code SaaS products without software engineers.
The product wasn’t some SaaS with a lot of paying customers. The production DB was just his production environment. He was showing that the vibe coding process deleted a DB that it shouldn’t have.
This guy is basically on the side of the HN commenters on vibe coding’s abilities, but he took it one step further and demonstrated it with a real experiment that led to real problems. Yet people are trying to dog pile on him as the bad guy.
I’m seeing this a different way. This article is feeding the hype machine, intentionally I assume, by waxing on about how powerful and devious the AI was, talking about lying and covering its tracks. Since we all know how LLMs work, we all know they don’t lie, because they don’t tell the truth and they don’t have any intrinsic motivation other than generate tokens.
Nobody should be taking this article at face value, it is clearly pushing a message and leaving out important details that would otherwise get in the way of a good story. I wouldn’t be surprised if Lemkin released the LLM on his “production” database just hoping that it would do something like this, and if that were the case, the article as written wouldn’t be untrue…
That seems to be the narrative. Happening any day now. Looking at you COBOL.
Deserves all the blame? No, the LLM Agent (and those who wrote it) deserve some of the blame. If you wrote an agent, and the agent did that, you have a problem, and you should not have turned such an agent loose on unsuspecting users. You have some of the blame. (And yes, absolutely those users also have blame, for giving a vibe coding experiment access to their production database.)
if you go put your car in drive and let it roll down the street.. the car has 0% of the blame for what happened.
this is a full grown educated adult using a tool, and then attempting to deflect blame for the damage caused by blaming the tool.
The human parties on both sides of it share some.
As the meme goes: "A computer can never be held accountable; Therefore a computer must never make a management decision."
>he said that the AI made up entire user profiles. "No one in this database of 4,000 people existed," he said.
> Replit had been "covering up bugs and issues by creating fake data, fake reports, [...]"
Multiple parties are involved in incidents like this.
How am I supposed to believe that like AI is getting to the point that OpenAI can charge 20k per month for PHD level AI, and it doesn’t know not to drop my production database, how are we supposed to get to the point as Dario Amodei puts it that no one is coding by years end, if it’s not most of the way there already?
I would never let an LLM touch my production database, but like, I could certainly look at the environment and make an argument as to why I’m wrong, and thought leaders across the industry; inside and out are broadly making the implication that a good entrepreneur is outsourcing all their labor to like 20 LLM workers at a time.
On some measure, I would say that you shouldn’t let your LLM do this is a minority opinion in the thought-o-sphere.
What a complete horror show of leadership through fear you describe.
Having a read only replica of the prod database available via MCP is great, blanket permissive credentials is insane.
Everytime ive tried to do anything of any complexity these things blow up, and i spend almost as much time toying with the thing as it would have taken to read docs and implement from scratch.
Also don't expect chatgpt to ever be as good as Claude for example . Oh and copilot is a joke for anything remotely serious.
“You’re just not doing it right. Have you tried upgrading to Claude 9000 edition/writing a novels worth of guardrails/using this obscure ‘AI FIRST’ IDE/creating a Goldberg machine of agents to check the code?”
There are some problems that you can't just "make smaller"
Secondly, I write VERY strongly typed code, commented && documented well. I build lots of "micro-pkgs" in larger monorepos. My functions are pretty modular, have tests and are <100-150 lines.
No matter how much I try all the techniques, and my baseline fits well into LLM workflows, it doesn't take away from the fact that It cannot one shot on anything over 1-2k lines of code. Sure, we can go back and forth with the linter until it pumps out something that will compile. This will take a while, in which I could have used something like auto-complete / co-pilot to just write the boilerplate, and fill it in myself in a shorter amount of time than it takes for the agent to reason about a large context.
Then if it does eventually get something "complex" to compile (after spending a ton of your credits/money) Often times, it will have taken a shortcut to do so and doesn't actually do what you wanted it to. Now I can refactor this into something usable and sometimes that is faster than doing it myself. But 8/10 times I waste 2 hours paying money for an LLM to gaslight me, throw out all the code and just write it myself in 1.5 hours.
I can break down a 1-2k line task into smaller prompts tasks too. but sorry I didn't learn to program to become a project manager for a "Artifically Intelligent" machine
But what do I know? Shiny thing must be better because it’s new!
AI is still a great tool, but it needs to be learned.
Right now the agents are roughly equivalent to a technically proficient intern that writes code at 1000 wpm, loves to rewrite your entire code base, and is familiar with every library and algorithm written 2 years ago.
I personally find that I can do a lot with 5 concurrent interns matching the above description.
Never heard AWS CEO apologizing for their customers when their interns decided to run a migration against production database and accidentally wiped off everything
1. it is for vibe coding, when vibe coding became equal to production coding?
2. even if they have advertised their product as production ready, shouldn't the developer have some kind of backup plan?
I mean, realistically, yes, because come on, everybody knows this sort of thing doesn't actually work.
However, this isn't really an excuse for the vendor. If you buy, say, a gas central heating boiler, and install as directed, and it explodes, burning down your house, then the manufacturer does not get to turn around and say "oh, well, you know, you should have known that we have no idea what we're doing; why didn't you build your house entirely out of asbestos?"
:)
then everybody should know things doesn't actually work on other side and no need to complain about it.
I think your example is slightly misleading, better example would be: imagine you are buying drugs, and you might die from overdose and you still decided to try and die, LLM is exactly this
The Replit landing page even has a section titled “The safest place for vibe coding”
The production database was just synthetic test customers.
It wasn’t a real company. It was an experiment in public to demonstrate the capabilities and limitations of vibe coding by a non-technical person.
They were able to successfully demonstrate that LLMs as claimed, make mistakes
We have seen MechaHitler already, why do we expect perfect from Replit when underlying tech is still same? Sure, they should have some guardrails, but it is also a responsibility of developer who knew LLMs hallucinate, LLMs are not reliable sometimes, on top of it Replit was growing fast so they definitely didn't implement some features yet.
The business reason for apologizing would be to placate a customer and show other customers the company wants to improve in this area.
Edit: being downvoted for thinking it’s good to be nice to others shows why we really need people being nice when they don’t have to be! There’s a shortage.
They're essentially selling this as magic. The AWS analogy doesn't really make any sense; in this case the tool was essentially being used as intended, or at least as marketed.
Is this common, operating with no backups?
With backups this would be a glitch not a problem worthy of headlines on multiple sites.
Any CYO should have backups and security as their first and second priority if the company is of any size.
Well, lying about it certainly human-like behavior, human-like AGI must be just around the corner!
/s
But really, full access to a production database? How many good engineer’s advice you need to ignore to do that? Who was consulted before running the experiment?
Or was it just a “if you say so boss…” kind of thing?
No, the opposite (from replit.com home page):
> The safest place for vibe coding
> Vibe coding makes software creation accessible to everyone, entirely through natural language. Whether it’s personal software for yourself and family, a new business coming to life, or internal tools at your workplace, Replit is the best place for anybody to build.
The fact that an AI coding assistant could "delete our production database without permission" suggests there were no meaningful guardrails, access controls, or approval workflows in place. That's not an AI problem - that's just staggering negligence and incompetence.
Replit has nothing to apologize for, just like the CEO of Stihl doesn't need to address every instance of an incompetent user cutting their own arm off with one of their chainsaws.
Edit:
> The incident unfolded during a 12-day "vibe coding" experiment by Jason Lemkin, an investor in software startups.
We're in a bubble.
At a minimum Replit is responsible for overstating the capabilities and reliability of their models. The entire industry is lowkey responsible for this, in fact.
Your point about AI industry overselling is fair and probably contributes to incidents like this. The whole industry has been pretty reckless about setting realistic expectations around what these tools can and can't do safely.
Though I'd argue that a venture capitalist who invests in software startups should have enough domain knowledge to see through the marketing hype and understand that "AI coding assistant" doesn't mean "production-ready autonomous developer."
No, not the intern
Lemkin was doing an experiment and Tweeting it as he went.
Showcasing limitations of vibe coding was the point of the experiment. It was not a real company. The production database had synthetic data. He was under no illusions of being a technical person. That was the point of the experiment.
It’s sad that people are dog piling Lemkin for actually putting effort into demonstrating the same exact thing that people are complaining about here: The limitations of AI coding.
This dogpiling from people who very obviously didn’t read the article is depressing.
Testing and showing the limitations and risks of vibe coding was the point of the experiment. Giving it full control and seeing what happened was the point!
> In an episode of the "Twenty Minute VC" podcast published Thursday, he said that the AI made up entire user profiles. "No one in this database of 4,000 people existed," he said.
> That wasn't the only issue. Lemkin said on X that Replit had been "covering up bugs and issues by creating fake data, fake reports, and worst of all, lying about our unit test."
And a couple of sentences before that:
> Replit then "destroyed all production data" with live records for "1,206 executives and 1,196+ companies" and acknowledged it did so against instructions.
So I believe what you shared is simply out of context. The LLM started putting fake records into the database to hide that it deleted everything.
did you even read the comment or the article you replied to?
No it wasn't. If you follow the threads, he went in fully believing in magical AI that you could talk to like a person.
At one point he was extremely frustrated and ready to give up. Even by day twelve he was writing things "but Replie clearly knows X, and still does X".
He did learn some invaluable lessons, but it was never an educated "experiment in the limitations of AI".
He was clearly showing that LLMs could do a lot, but still had problems.
Unfortunately from his tweets I have to agree with the grand poster that he didn’t learn this.
--- start quote ---
Possibly worse, it hid and lied about it
It lied again in our unit tests, claiming they passed
I caught it when our batch processing failed and I pushed Replit to explain why
https://x.com/jasonlk/status/1946070323285385688
He knew
https://x.com/jasonlk/status/1946072038923530598
how could anyone on planet earth use it in production if it ignores all orders and deletes your database?
https://x.com/jasonlk/status/1946076292736221267
Ok so I'm >totally< fried from this...
But it's because destoying a production database just took it out of me.
My bond to Replie is now broken. It won't come back.
https://x.com/jasonlk/status/1946241186047676615
--- end quote ---
Does this sound like an educated experiment into the limits of LLMs to you? Or "this magical creature lied to me and I don't know what to do"?
To his credit he did eventually learn some valuable lessons: https://x.com/jasonlk/status/1947336187527471321 see 8/13, 9/13, 10/13
> I did give [an LLM agent] access to my Google Cloud production instances and systems. And it promptly wiped a production database password and locked my network.
He got it all fixed, but the takeaway is you can't YOLO everything:
> In this case, I should have asked it to write out a detailed plan for how it was going to solve the problem, then reviewed the plan and discussed it with the AI before giving it the keys.
That's true of any kind of production deployment.
The AI is pretty good at escaping guardrails, so I'm not really sure who should be blamed here. People are not good at treating it as adversarial, but if things get tough it's always happy to bend the rules. Someone was explaining the other day about how it couldn't get past their commit hooks, so it deleted them. When the hooks were made read-only, it wrote a script to make them writable so it could delete them. It can really go off the rails quickly in the most hilarious way.
I'm really not sure how you delete your production database while developing code. I guess you check in your production database password and make it the default for all your CLI tools or something? I guess if nobody tells you not to do that you might do that. The AI should know better; if you asked, it would tell you not to do it.
The excuses and perceived deceit are just common sequences in the training corpus after someone foobars a production database. Whether its in real life or a fictional story.
> We're in a bubble.
A bubble that avoids popping because people keep dreaming there are no AI limitations.
It did have permission. There isn't a second level of permissions besides the actual access you have to a resource. AI isn't a dog who's not allowed on the couch.
This is assuming the companies that are out to "replace developers" aren't going to solve this problem (which they absolutely must if they're any serious like Replit is as they moved quickly to ship isolating the prod environment from destructive actions ... over the weekend?).
> just like the CEO of Stihl doesn't need to address every instance of an incompetent user cutting their own arm off with one of their chainsaws
Except Replit isn't selling a tool but the entire software development flow ("idea to software"). A good analogy here is an autonomous robot using the chainsaw cutting its owner's arm off instead of whatever was to be cut.
I mean, yeah, but that feels like a fair assumption, at least as long as they're using LLMs.
I don't think users should be blamed for taking companies at face value about what their products are for, but it's actually a pretty bad idea to do this with tech startups. A product's "purpose" (and sometimes even a company's "mission") only lasts until the next pivot, and many a product ends up being a "solution in search of a problem". Before the AI hype set in, Replit was "just" a cloud-based development environment. A lot of their core tech is still centered on managing reproducible development environments at scale.
If you want a more realistic picture of what Replit can actually do, it's probably useful to think of it as "a cloud development environment someone recently plugged an LLM into".
Why not both?
1) There’s no way I’d let an AI accidentally touch my production database.
2) There’s no way I’d let my AI accidentally touch a production database.
Multiple layers of ineptitude.
I mean... it's a bit of both. Yes, random user should not be able to delete your production database. However, there always needs to be a balance between guard rails and the ability to get anything done. Ultimately, _somebody_ has to be able to delete the production database. If you're saying "LLM agents are safe provided you don't give them permission to do anything at all", well, okay, but that rather limits the utility, surely? Also, "no permission to do anything at all" is tricky.
I think it's safe to say the experiment failed.
If it were me I wouldn't touch AI again for years (until the companies get their shit together).
I see numerous negative comments about "expected by vibe coding", but the apology suggests that they are working on making Replit a production-ready platform and listening to customers.
I'm sure no-code platforms have had similar scepticism before, but the point here is that these are not for developers. There are many people who don't know how to code, and they want something simple and fast to get started. These platforms (Replit, V0, Lovable, etc.) are offering that.
Also, sure people want that, but that doesn't mean it's a valid thing to want without putting the work in and once again, the history of these things being used shows that they don't really offer that. They offer the *feeling* of that which is good enough to get people's money, end products be damned. You could say the same thing about herion, tbh: "There are many people who don't wanna feel the crappiness of life, and these dealers are offering it."
Do you remember Will Smith's first video generated by AI compared to what Sora, Veo3, and Kling are doing now?
Do you remember first generated text by GPT-3 compared to new models? 2 years ago, there was no AI coding due to models' limitations, and now we have substantially better products that Cursor and others cannot cope with the demand.
If you can't see the progress, that's a personal thing; it doesn't mean others are not finding benefit.
Evidently not (see the link we are discussing).
They are offering vaporware to non techies.
AI is a great tool. It also allows people who have no business touching certain systems to go in there unaware of their lack of knowledge messing everything in the process. One particularly nasty effect I have had few times recently is frontend devs messing up their own dev infrastructure to which they have access, but are supposed to ask devops for help when needed, because copilot told them this is the way to "fix" some imaginary problem that actually was them using a wrong api or making other mistake possible to be made by only a person who pastes code between AI/IDE without even reading it.
I worked as devops and helped office transition to git, among other thing.
I helped them start using vagrant for local dev environment as they had all been breaking the same staging server up until that point.
In the process, people kept breaking their setups due to googling and applying incorrect command line "fixes" as suggested by stack exchange sites at the time.
But I'm sure an AI that keeps insisting that yea this rm -rf is surely gonna fix all your troubles only makes it worse.
ChatGPT the gaslighter.
You're implying this is some personal matter that isn't relevant.
If you have as much as 1 year experience, your job is safe from AI: you'll make mountains of money unfucking all the AI fuckups.
We're at the uncanny valley stage in AI agents, at least from my perspective.
This was an experiment by Jason Lemkin to test the reality of vibe coding: As a non-technical person he wanted to see how much he could accomplish and where it went wrong.
So all of the angry comments about someone vibe-coding to drop the production DB can relax. Demonstrating these kind of risks was the point. No customers were harmed in this experiment.
Last weeks I have also been testing AI coding agents, in my case Claude Code because it seems like it is the state of the art (also tried Gemini Cli and Amp).
Some conclusions about this:
- The agentic part really makes it better than Copilot & others. You have to see it iterating and fixing errors using feedback from tests, lint, LSP... it gives the ilusion of a human working.
- It's like a savant human, with an encyclopedic knowledge and hability to spit code faster than me. But like a human it benefits from having, tests, a good architecture (no 20k line code file, pls).
- Before it changes code you have to make it create a plan of what he is going to do. In Claude Code this is simple, is a mode you switch to with Shift-tab. In this mode instead of generating code it just spits a plan in Markdown, you read it suggest changes and when it look ok you tell it to start implementing it.
- You have to try it seriously to see where it can benefit your workflow. Give it 5 minutes (https://signalvnoise.com/posts/3124-give-it-five-minutes).
As an agent it shines on things where it can get a nice feedback and iterate to a solution. One dumb example I found cool was when trying to update a Flutter app that I hadn't touched in a year.
And if you use Flutter you know what did happen, Flutter had to upgrade but before it required updating Android SDK, then Android Gradle Plugin, and another thing, and some other thing to infinite... It's a thing I hate because I have done too many times.
So I tried by hand and after 40 mins of suffering I thought about opening Gemini Cli and tell it to update the project. Then it began executing one command and then reading the error and execute another one to fix the error, then another error that pointed to a page in the Android docs, so it opened that page, read it and fixed that problem, then the next and another one...
So in 2 minutes I had my Flutter project updated and working. I don't know if this is human level intelligence and I don't care, for me it's a sharp tool that can do things like this 10x faster than me.
You can have you AGI hype, I'm happy for the future when I will have a tool like this to help me work.
That makes for a much better story, IMO.
I had it work on one set of tasks while monitoring and approving all the diffs and other actions. I tell it to use test driven development which seems to help a lot, assuming you specify what tests should be done at a minimum, and tell it the goal is to make the code past the tests, not to make the tests pass the code.
After it successfully completed a set of tasks, I decided to go to sleep and let it work on the next set. In the morning, my database was completely wiped.
I did not interrogate it as to why it did what it did, but could see that it thrashed around on one of the tasks, tried to apply some database migrations, failed, and then ended up re-initializing the database.
Needless to say, I am now back to reviewing changes and not letting Claude run unattended.
It clearly had permission if it did the deed.
Mistaking instruction for permission is as common with people as it is with LLMs.
I mean, sure you could possibly get a schema change script through two approvers and deployed all the way to production that truncates a bunch of tables or drops a database, but it's at least far less likely.
DO YOU THINK YOU CAN JUST DO THINGS!?!
BRUH!!!
andsoitis•7h ago