[1] https://www.semafor.com/article/01/15/2025/replit-ceo-on-ai-...
He's not saying there is no need for professional coders, just that they're not the core market for Replit.
(TBH I'm doubtful they ever were)
It was an interesting service, but I guess AI pays better.
I suspect the problem is that the previous model didn't pay at all. I've used it a lot to try out code snippets, but no one pays for that. It was one of those services that a product would outgrow well before you could imagine spending serious money on it.
It is a shame they didn't find traction with the learn to code without all the hassle angle they once had.
- He's a courseboi that sells a community that will make you 'Get from $0 to $100 Million in ARR'
- The stuff about 'it was during a code freeze' doesn't make sense. What does 'code freeze' even mean when you're working alone and vibe coding and asking the agent to do things
- Yes LLMs hallucinate. The guy seems smart and I guess he knows it. Yet he deliberately drives up the emotional side of everything saying that replit "fibbed" and "lied" because it created tests that didn't work.
- He had a lot of tweets saying that there was no rollback, because the LLM doesn't know about the rollback. Which is expected. He managed to rollback the database using Replit's rollback functionality[0], but still really milks the 'it deleted my production database'
- It looks like this was a thread about vibe coding daily. This was day 8. So this was an app in very early development and the 'production' database was probably the dev database?
Overall just looks like a lot of attention seeking to me.
[0] https://x.com/jasonlk/status/1946240562736365809 "It turns out Replit was wrong, and the rollback did work."
You have to remember this is someone who is almost certainly completely non-technical and purely vibe coding. He won't know what things like code freeze, rollbacks, production database, etc actually mean in real engineering terms and he is putting his full trust in the LLM.
The "rules" thing in LLM coding probably should be called "suggestions" because it never seems that stringent about them.
This is akin to saying "I saved money on lawyers by doing it myself"
I don't think non-techies will ever be able to sustainably make commercial software without "bridging" LLM layers such as virtual engineering managers and project leads which keep the raw engineering LLMs in check.
Assuming it really happened, why would you then go ahead and ask the model why it deleted the database? That makes no sense.
It's not very surprising that it would then act like an incompetent developer. That's how the fiction of a personality is simulated. Base models are theory-of-mind engines, that's what they have to be to auto-complete well. This is a surprisingly good description: https://nostalgebraist.tumblr.com/post/785766737747574784/th...
It's also pretty funny that it simulated a person who, after days of abuse from their manager, deleted the production database. Not an unknown trope!
Update: I read the thread again: https://x.com/jasonlk/status/1945840482019623082
He was really giving the agent a hard time, threatening to delete the app, making it write about how bad and lazy and deceitful it is... I think there's actually a non-zero chance that deleting the production database was an intentional act as part of the role it found itself coerced into playing.
Without speculating on the internal mechanisms which may be different, what surprises me the most is how often LLMs manage to have the same kind of failure modes as humans; in this case, being primed as "bad" makes them perform worse.
See also "Stereotype Susceptibility: Identity Salience and Shifts in Quantitative Performance" Shih, Pittinsky, and Ambady (1999), in which Asian American women were primed with either their Asian identity (stereotyped with high math ability), or female identity (stereotyped with low math ability), or not at all as a control group, before a maths test. Of the three, Asian-primed participants performed best on the math test, female-primed participants performed worst.
And this replication that shows it needs awareness of the stereotypes to have this effect: https://psycnet.apa.org/fulltext/2014-20922-008.html
In my view, language is one of the basic structures by which humans conceptualize the world, and its form and nuance often affect how a particular culture thinks about things. It is often said that learning a new language can reframe or expand your world view.
Thus it seems natural that a system which was fed human language until it was able to communicate in human language (regardless of any views of LLMs in an greater sense, they do communicate using language) would take on the attributes of humans in at least a broad sense.
That was sort of the whole concept of Arrival; but in an even more extreme way.
Meanwhile by 'code freeze' they actually meant they had told the agent that they were declaring a code freeze in natural language and I guess expected that to work even though there's probably a system prompt specifically telling it its job is to make edits.
It feels a bit like Michael from The Office yelling "bankruptcy!"
-
I have to say, instruction tuning is probably going to go down in history as one of the most brilliant UX implementations ever, but also has had some pretty clear downsides.
It made LLMs infinitely more approachable than using them via completions, and is entirely responsible for 99% of the meteoric rise in relevance that's happened in the last 3 years.
At the same time, it's made it painfully easy to draw completely incorrect insights about how models work, how they'll scale to new problems etc.
I think it's still a net gain because most people would not have adapted to using models without instruction tuning... but a lot of stuff like "I told it not to do X and it did X" where X is something no one would expect an LLM to understand by its very nature, would not happen if people were forced to have a deeper understanding of the model before they could leverage it.
> Why are engineers so obstinant... Add these instructions to your cursor.md file...
And so on.
Turns out "it's a prompting issue" isn't a valid excuse for models misbehaving - who would've thought: It's almost like it's a non-deterministic process.
There is something to say about these incompetent morons making more money than 100 nurses but I'm not smart enough to do it
And let's be honest, most companies would probably agree that incompetent morons are what future economic growth will be built upon. Either that, or the pay2win mobile game addicts which the industry lovingly calls "whales".
OK the car scraped my other car. I made it apologize and PROMISE not to do it again.
OH NO IT HITS THINGS AND PEOPLE EVEN WHEN IT SAID IT WAS GOING TO BE GOOD.
2026 will be ridiculous fun to do bug hunting. As you know, in VIBE coding, S stands for security.
In general this works with people. Accountability is part of it. But also, most people want to help.
I don't see how this works with LLMs. Consistent good results are not indicative of future performance. And despite the way we anthropomorphize LLMs, they don't have any true concept of being helpful, malice, etc.
I have been vibe coding (90% Cursor, 10% Claude Code) for an entire month now (I know how to code, but I really want to explore this space and push the boundaries).
I found that LLM agents are notoriously bad at two things: 1. Database migrations 2. Remembering they are supposed to write tests and keep ALL of them green (just like our human juniors...)
Database migrations
I am incapable of making the coding agent follow industry best practices. E.g. when in development and a new field is needed in the DB, what most web frameworks / ORMs offer is a migration up and down that does not affect the DB. I do not want to reset my DB even if I am developing locally.
So far the agent has been doing weird stuff, almost always ending with a DB that needed a reset to get back to work. Often times the agent would ignore my instructions NEVER to reset nor RUN migrations.
By extrapolating this misbehavior to production, I can imagine how badly this could end.
Actually, as long as there are no STRICT guarantees by LLM providers on how to prevent the LLM from doing something, this issue will never get solved. The only way I found is to block the agent running certain commands (requiring my consent) but that can only take me so far, since there are infinite command line tools the agent can run.
Tests
This one is equally bad in terms of LLMs ignoring instructions, possibly with less potential for disaster, yet still completely weird behavior.
Of all the instructions / prompts I give to LLMs, the part about testing gets ignored the most. By far. E.g. I have in my custom prompts an instruction for always updating the CHANGELOG.md file - which the agent ALWAYS follows even for the tiniest changes.
But when it comes to testing - the agent will almost never write new tests or run the test suite as part of a larger change. I almost always have to tell it explicitly to run the tests, fix the failing ones. And even then it will fix 8/10 tests and celebrate big success (despite the clear instruction that ALL tests must pass, no excuses).
Happy to exchange thoughts and ideas with someone with similar struggles - meet me on X (@cogito_matt). I am working on a LLM-powered agentic AI tool for data analysis / BI and so far the experience has been fantastic - but LLMs really require to think differently about programming and execution.
You're doing this the wrong way around. You need to default to blocking and have an allowlist for the exceptions, not default to allowing and a blocklist for the exceptions.
Terr_•5h ago
Well... yeah, this is a totally expect-able failure route, because LLMs are just bullshitting document-generators.
When you say, "don't make changes", there isn't even an entity on the other end that can "agree." The fictional character doesn't really exist, and the ego-less author isn't as smart as the character seems.
rich_sasha•4h ago
But of course I agree the software-implemented constraints work better in humans.
ChrisMarshallNY•3h ago
Like so?
https://vimeo.com/126720159
manquer•3h ago
Yes, however first there is an understanding involved when the other operator is intelligent [1] secondly there are consequences which matter to living being which don’t apply to an agent. Humans needs to eat and take care of family, for which they need a job so have lot less freedom to disobey explicit commands and expect to do those things.
Even if an agent becomes truly intelligent you cannot control it well if they do not have hunger, pain love or any number of motivation drive[2].
——
Depending on the type of red button you can always design safe guards (human or agent) , we after all haven’t launched nuclear war heads either by mistake or by malicious actor (yet).[3]
——-
[1] which humans are and however much the industry likes to think otherwise agents are not
[2] Every pet owner with a pet that have limited food drive will tell you how hard it train their dog versus the ones that have one, even if they are an intelligent breed or specimen.
[3] Yes we have come alarmingly close few times , but no one has actually pressed the red button so to speak .
ben_w•3h ago
While true, I think there's a different problem here.
Humans are observed to have a wide range of willingness to follow orders: everything from fawning, cult membership, and The Charge of the Light Brigade on the one side; to oppositional defiant disorder on the other.
AI safety and alignment work wants AI to be willing to stop and change its behaviour when ordered, because we expect it to be dangerously wrong a lot, because there's no good reason to believe we already already know how to make them correctly at this point. This has strong overlap with fawning behaviour, regardless of the internal mechanism of each.
So it ends up like Homer in the cult episode, with Lisa saying "Watch yourself, Dad. You're the highly suggestible type." and him replying "Yes. I am the highly suggestible type" — And while this is a fictional example and you can't draw conclusions about real humans from that, does the AI know that it shouldn't draw that conclusion? Does it know if it's "in the real world" or does it "think" it's writing a script in which case the meme is more important than what humans actually do?
> [1] which humans are and however much the industry likes to think otherwise agents are not
I have spent the last ~ year trying to convince a customer support team in a different country that it's not OK to put my name on bills they post to a non-existent street. Actually it is quite a bit worse than that, but the full details will be boring.
That said, I'm not sure if I'm even corresponding with humans or an AI, so this is weak evidence.