Ask HN: What was your "oh shit" moment with GenAI?

29•andrehacker•20h ago

Most of us were amused when DALL-E and its peers went mainstream, and we were quick to point out the obvious flaws.

Then ChatGPT hit the scene and again, many of us dismissed it as a parlor trick that would never amount to much.

Using LLMs for coding initially was a only small step up from basic code completion, and a welcome farewell to Stack Overflow.

I am curious: what was the specific moment that you went from those quaint, dismissive observations to a slightly panicked, "Uh Oh" realization of what these models can do?

Comments

bigyabai•20h ago

BERT, then GPT-J/GPT-Neo and FLAN-T5

damnitbuilds•20h ago

My "Oh shit" moment was when my boss got the bill for me trying to vibe code a bugfix.

simsation•19h ago

When I saw a very basic mockup of a website and realized AI could generate the entire page from it (this was shortly before ChatGPT came out)

zhoBEENG•19h ago

It was when I first saw an LLM reliably make tool calls to bash.

LargoLasskhyfv•15h ago

The smallest Deepseek R1 8B, running locally on CPU only, casually mentioning Efinix Trion FPGA fabrics while discussing technology mappings for different substrates of different vendors in the context of partial dynamic reconfiguration.

WTF?!

SpecStudioHN•14h ago

when ChatGPT was released. LLMs went from being a toy to a serious creative tool overnight.

dang•1h ago

(1) Watching it do log file analysis in seconds that would have taken me hours (edit: days, in fact), and which I would therefore never have done in the first place.

(2) Helping me with optimizations that I had been putting off for years because they involved learning curves that I never had time to take on.

(3) Tracking down bugs in code, especially race conditions and other concurrency issues, that were otherwise baffling.

(4) Finding information that I had been unable to find using Google searches (e.g. https://news.ycombinator.com/item?id=42653136).

There have been others, but those are what come to mind - perhaps because, in each of these cases, it made something happen that would otherwise never have happened - not because it was impossible, but because the level of effort required was prohibitive.

refulgentis•1h ago

Using GPT-3 to translate the color science code I wrote for Google's design system from Dart to ~any language so I could get it deployed cross platform quickly, and it all worked.

spwa4•1h ago

When I wrote a captcha cracking convnet in 2000 and tested it ...

And in 1 out of 5 runs it beat me.

utopiah•1h ago

When none of the models, STOA or not, could answer any genuinely interesting question. All models could regurgitate was has been expressed before but nothing actually new was there, until explicitly asked for, and even then it required filtering through potentially so much noise it was practically not interesting anymore as it required all the knowledge to validate or invalidate the claims. That's when, few years ago, I realized "Oh shit... despite all the tremendous effort and resources, it's still not that useful.". Honestly this was NOT was I expected. Yet, it was an important realization.

aappleby•58m ago

Are you sure you're asking the right questions?

utopiah•50m ago

To me they were important questions. Maybe totally interesting to you.

aspenmartin•57m ago

Curious what your interesting questions were, you should be able to find them in your chat history.

utopiah•52m ago

That was more than a decade ago so unfortunately not. I should have kept those questions though. I even mention in a comment on HN a while ago that unanswered or wrongly answered questions should precisely be a batch test when new models are released.

poly2it•

dyauspitr•58m ago

I was trying to replace my koi pond pump last weekend and the model numbers on it had washed away. I took a picture of it and it immediately narrowed it down to two models but wasn’t sure if it was the 4500 model or the 2500 model. I asked it how I can determine which one it was. It then asked me to measure the length and that the 4500 was 11 inches and the 2500 was 9 inches. Mine was 11. It was cool it was able to reason that out and give me something actionable.

It’s kind of a trivial example but there are multiple instances of this per week with the wide variety of things I do around my property.

hannahstrawbrry•57m ago

Had an issue in a project where multiple media files with the same/similar names were colliding. After spending hours with chat gpt wrangling python scripts to try and sort it out programmatically, I shifted gears and built a web tool that would allow me to manually review the content and select the correct media file to associate with it in about 5 minutes, allowing me to comb through and finally fix the issue & verify the content was correct in about an hour. It made me realize I needed to completely re-think how I set about solving problems now that I have an entirely different set of tools to develop- that has been the biggest "Oh shit" moment for me, looking into the mirror and recognizing how AI will re-shape me as a developer.

mikewarot•56m ago

I tried to get it to generate code to program one of my BitGrid simulators, and it kept producing code that failed, over and over. It was then that I figured out that it can only do CRUD apps and the like, things it's seen over and over in its training data.

It's useless for most of what I want to code.

cheevly•5m ago

GPT literally generates perfect code for me in languages that do not exist anywhere in its training set, so I’m not sure how you’ve achieved this level of failure.

hansvm•56m ago

A coworker had me work through a particular problem (some no-importance web demo) with Cursor and Sonnet 4.6. It still sucked, but there was a qualitative shift in suckiness, one that I realized could finally be used to solve some real problems I had if I wrote an appropriate harness and used good enough models.

I still find it mandatory to write a lot of kinds of code by hand, but I write a lot of code with agents too now, and I previously literally didn't think that'd happen in <5yrs.

saadn92•55m ago

I use claude code on a daily basis, but honestly it becomes more annoying the more I use it. Why? I think because I ask it to do something and unless I'm extremely specific, either the code is verbose or the feature I'm designing is done in a poor way. For me, the productivity gains aren't that great and I'm even considering whether to go back to doing things by hand to save myself the frustration. Sure, if you don't care about code quality or scalability, it's a great thing to generate code. And yes, there are times when I don't, but for real projects, I actually do because I know as an engineer those things do matter in the long run. So, to be honest, I still haven't had that moment.

pythonaut_16•51m ago

It has seemed to me that with each step from Opus 4.6, to 4.7 to 4.8 Claude has gotten worse at building good solutions. Like perhaps it is more "capable" in the small scale than 4.5 was but it's much worse at knowing what to do.

tripledry•34m ago

From a technology perspective LLMs are absolutely bonkers, blows my mind it works as well as it does.

From a programmer perspective, I'm starting to like it less and less. It's useful for sure, but doesn't really live up to the hype. In many ways it's the opposite, my bet is still that programmers will be in high demand in the not so distant future after all of this settles.

Might be wrong, time will tell.

Fomite•54m ago

When we had to have a frank discussion about whether to fail someone who obviously used an LLM for parts their dissertation.

sevennull•15m ago

well?

bag_boy•54m ago

I had ChatGPT write up a Zillow description for my house in the style of Carrie Bradshaw from “Sex and the City” to impress my wife.

It was unlike anything I had ever experienced.

My wife was unimpressed lol.

This was 2022.

moconnor•50m ago

Literally the very first time I used ChatGPT. I had already been experimenting with GPT3 for various jokes and games via the API but the naturalness of it as a chat interface that understood you changed everything.

The first time I used a terminal agent was another one.

boredhedgehog•50m ago

"Translate this poem. Maintain meter and rhyme."

steren•50m ago

The moment when I ran llama on my old gaming PC (using something called ChatGPT4All) was my "oh shit" moment: I was now talking... to my PC.

overgard•50m ago

I feel like with the hype cycle and constant publishing of sketchy claims that I pretty much daily have an "oh shit" moment followed by a "nope, everything is about the same" moment. It's frankly exhausting. It's hard for me to recall a subject that has irritated me as much over a period of years, and it's barely even about AI itself but instead just feeling harassed with the constant anxiety and rage baiting.

tripledry•43m ago

I felt the same way, then I started with "I'll believe it when I see it". Now I'm a bit happier.

skyberrys•35m ago

Pretty good take. I don't really get the feelings of anxiety, but sometimes I'm working and I'm like I'm flying this is so fast! And then everything comes crashing down when I can't figure out one last bug.

sct202•45m ago

One of our SAAS providers launched an AI agent enabled version, and it can follow direction and do tasks & manipulate data/settings in the software like on par with a below average person. When I used it I had a sinking feeling, tons of teams and people will be redundant as these agents improve and roll out to other software.

knuckleheads•44m ago

I remember a couple months after ChatGPT came out I was in a 1-1 with a coworker who hadn’t really played around with it much. I was very much toying around with it and was surprised at how good at stuff it was. I wanted to show him it was for real, he was skeptical, so over a half hour we had it make a bee and a flower buzz around in d3, copying and pasting between jsfiddle and ChatGPT. By the end of it, we had a nice animation and were both throughly surprised that the computers could code so well now.

bluejay2387•43m ago

I had a locally hosted model write its own semantic search system that indexed 250,000 documentation and code files and then write a fully functioning mod for one of the games I play based on that documentation that I couldn't get to work after 2 weeks of my own effort, all in under 4 hours (and that included a 25 minute long indexing process). This freaked me out enough that I then had it write a CLI based activity and TODO tracker and then integrate that tool into its coding process to track all of its activities in about another 2 hours. I am still emotionally recovering from this day. I have since replaced the semantic search system with an open source option (though I used it for a few months) but I still use the activity tracker for both coding projects and myself.

gravypod•5m ago

What mod did you build?

jkraybill•43m ago

So many. First was when I saw GPT-2 create jokes that were original and kinda funny.

Most recent: I use Claude Code and have a convention where I grant various levels of autonomy during a session. I got bored recently and just let it keep running with an empty issues queue, essentially telling it to do whatever it wanted.

It did a bunch of repo cleanup, then it kept suggesting to end the session, but I just kept giving it autonomy prompts.

It started a creative writing public repo and wrote a bunch of stories, essays, and poems. I did not prompt it, at all, to do that. Some of what it wrote is quite good (IMHO).

goldenarm•43m ago

The first SORA release truly scared me. The uncanny valley of simulating life like this still creeps me out to this day.

kgwxd•43m ago

When it started being forced on me in tools I was already using begrudgingly.

jmkni•41m ago

Not coding, but reading logs.

I was trying to figure out a nightmare bug that only happened in production and Claude code was able to connect to Google Cloud and read the logs in real time

I recreated the bug in the UI and it was instantly able to see ion the logs what the problem was, then because it had the context of my whole codebase it was able to point me to the exact line of code causing the problem

That was certainly an "oh shit" moment

shreddude•40m ago

I could go on and on, but Claude recently decompiled the firmware of my camper van, documented all the CAN interfaces, then programmed an ESP32 module to talk to the van’s integrated systems (power, HVAC, lighting, tanks). That sort of embedded systems integration is completely out of my wheelhouse.

I honestly don’t understand AI naysayers. I use Claude every day both professionally as a Solution Architect and personally in a variety of projects I simply could not have ever approached alone.

rvnx•14m ago

I get it understand either.

"This is just a stochastic parrot".

I suppose these people are lying so that they can justify their well-paid job, or they just don't know how to use LLMs or to prompt GenAI tools.

orzig•40m ago

"Write a bible verse ... explaining how to remove a sandwich from a VCR" https://x.com/tqbf/status/1598513757805858820

EliRivers•39m ago

Code reviews. Code reviews in theory done by humans, but containing copy-pasted inane statements of the obvious. Questions that really did no more than demonstrate a lack of context. Code reviews no longer an educational opportunity for the reviewer, a way they learn and stress their own understanding to create a better product and become a better person, destroyed by the siren song of GenAI producing comments that on the surface seem so helpful and sensible.

"Uh Oh" realization of what these models can do?

The code reviews was just how I first saw it, but the rot goes deeper. The "uh oh" was my realisation of how much these can damage people's professional development. These people will never get better at their job than they are right now.

A lot of what else GenAI does is great, but this is an "Uh oh" indeed.

twooclock•39m ago

I programmed data export to some xml over a couple of days. Sending xml results via email to an accounting firm for verification. A day after I finished my disk crashed and I lost all my code. Fed Claude with xml from my mail and... oh shit! ... got "my" code back. (And immediately paid for Claude subscription) :-)

oidar•39m ago

Opus 4.6. My standard battery of questions included solving an ascii maze (20x20 grid) without using a script, using only "thinking" as a tool. It was the first model to be able to solve it. It was the first model that really appeared to be able to reason spatially.

briga•37m ago

Maybe when I found out you can use it to run terminal commands, spin up and take down dev environments, and even run other LLMs. Suddenly 90% of the difficulty of onboarding to new repos disappeared overnight and a lot of heavily CLI-based workflows became trivial to automate. Never again do I want to spend hours manually sorting out Python dependencies.

evdubs•35m ago

I tried to see if an LLM service provider could rewrite some legal docs where nothing was hallucinated in order to follow a consistent format to see what may be missing in the document. It could do that.

Next, I wanted to see if this could be done with a local LLM. Gemma-4 handles this fine with an 8GB video card and a large context (128k).

Next, I wanted to see if the model could also OCR these docs and translate them. The same model can handle that quite well.

This was when I realized LLMs should be great for handling work where:

- I already know what I want to do

- I already know how to do it

- I don't think this task will help develop skills I find to be valuable

- If I have to do it manually myself, I will probably cut corners

So now I view LLMs through the lens of, "what work can I send to an LLM that I otherwise would not really care about doing."

SoftTalker•17m ago

Yes, the best results I've had using LLMs are for tasks where simply reading and reformatting/translating/summarizing are the goals. They are much faster and less prone to boredom doing these things than humans are. For now.

hypendev•30m ago

Back in the times of GPT3 text completion, right before the API came out, a contemporary art museum asked me to collaborate on a project. The project was supposed to include a chatbot, and I was like okay I can probably hook something up.

Then I remembered the "text completion LLM thingy" I saw on HN, and tried it out in the playground. Once I gave it an IRC style example of a conversation to complete, I was like hm, this could work. Then I figured out I could "sort" people into different groups based on personality using the same text completion engine and some answers they provided. Then I noticed I could have it provide me with JSON directly.

That's when I realized how big this could be for code and data analysis - even tried to convince an at the time cofounder to pivot into AI coding, but to no avail.

Once the API was released and the art project chatbot got launched (and the theater show associated with it, which even won some awards), people who used it loved the chatbot, got into heated arguments with it, tried to teach it things, talked about their lives and were sad when it didnt remember something.

That was when I understood the social impact this could have on people - they really behave like its a person on the other side. They show interest, think it displays emotion, try to entertain it, be polite, ask about its thoughts and hopes and dreams. And even when they knew they were talking to a machine, they were still trying to be friends and make it happy, which was quite beautiful to see.

Later on, I had a third oh shit moment - once the 3.5 API was out and about, I prototyped a Rust code generation harness for a client, akin to a primitive claude code. That was the "I'm getting a bit worried" oh shit moment, and it caused a lot of reflection and thinking about the future. And I happily welcome it.

nsikorr•23m ago

Definitely the first NotebookLM podcast I generated.

jmclnx•19m ago

Non-technical people I know are starting to take AI responses to their questions as 100% true fact.

SoftTalker•15m ago

They did the same with Google search results that were just SEO garbage content, too.

1qaboutecs•18m ago

Was trying to explain convolution (of functions) to a friend and I wanted to build a little picture. I typed more or less nothing into Claude and it gave me a fine web-app for demo'ing examples to my friend within minutes.

Three years ago this would have taken a minimum of three college graduates a couple days -- one to know the math, one to know the backend, and one to know the front-end. Maybe two of those could be the same person on a good day -- none of the topics is individually that hard -- but it's a lot together.

simonw•17m ago

ChatGPT Code Interpreter back in ~March 2023. I uploaded a CSV file (of police incidents in San Francisco) and watched it load that into Pandas, show me some charts, then export the data to a SQLite database file for me to download.

I write software for data journalists and this new thing appeared to be able to do everything I wanted my software to do just as an unplanned side effect of having the ability to run Python against a folder with some uploaded files in it.

With hindsight it was my first exposure to a coding agent, but we hadn't named the category at that point.

mbo•16m ago

Look, not to brag but DALL-E's "armchair in the shape of an avocado" was mine (https://openai.com/index/dall-e/). I remember trying to convey the gravity of this capability to my friends at the time, who I guess were not as impressed as me.

KaiserPro•13m ago

I've had a few.

The biggest technical one was when we were making an all day wearable AI assistant thing. It basically had really precise office location (think cm level accurate) a shitty VLM to describe what the wide angle lens was looking at, Speech to text, OCR and a gaze recorder that decribed what you were looking at.

This was all streamed to sqlite. The thing that was really "oh shit" what the thing that made the whole system usable: a 4 paragraph prompt that turned natural language into SQL and reported back to the (non technical user) what they wanted to know.

The most recent one is being caught out by Genai video of a gymnast. I worked in VFX so I am normally able to spot dodgy shit, but this one was close to being real, scarily real.

abstractanimal•12m ago

When I realized that an LLM can process all the traffic in Slack that overwhelms me daily and give me a manageable digest. How long until they intermediate most of our social interactions? Sooner than we can possibly adapt, I think.

jazzyjackson•7m ago

If you social interactions can be mediated by a chatbot I implore you to find better social interaction

cheevly•4m ago

If yours cant, then I implore you to find better AI mediation tools.

cheevly•7m ago

Ever since the first Davinci model of GPT-3 ive literally been using LLMs daily. It was an indispensable tool for me from the very beginning and despite 10,000+ hours of usage and research, I still feel like ive barely cracked the surface of whats possible with current genai tech.

irthomasthomas•6m ago

My most recent one: Taking a bricked ipad and plugging it into my linux laptop, then telling deepseek to fix it. A couple of hours and twenty sudo passwords later it was working again.

Astronauts told to return to ISS after sheltering over air leak repairs

pg_durable: Microsoft open sources in-database durable execution

Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency

Three of our worst VC stories

My Agent Skill for Test-Driven Development

New method turns ocean water into drinking water, without waste

Mouseless – keyboard-driven control of macOS/Linux/Windows

Conventional Commits encourages focus on the wrong things

Transformers Are Inherently Succinct

Gov.uk has replaced Stripe with Dutch provider Adyen

Accidentally deleted subscriptions for chat integrations (Slack and MS Teams)

I tested every IP KVM in my Homelab

Did Claude increase bugs in rsync?

Do the Hardest Thing

"Maybe later" was a feature

Cooldown Support for Ruby Bundler

Launch HN: General Instinct (YC P26) – Frontier models on edge devices

Ask HN: What was your "oh shit" moment with GenAI?

Mantine-datatable (and others) compromised – owner account suspended

Inside FAISS: Billion-Scale Similarity Search

Tracing a powerful GNSS interference source over Europe

Google to pay SpaceX $920M a month for compute capacity at xAI data centers

Nango (YC W23, dev infra) is hiring staff back end engineers

India's surprise baby bust

Redis 8.8: New array data structure, rate limiter, performance improvements

Dutch gov't will only allow European company to operate DigiD platform

C++: The Documentary

Let's celebrate work that is 100% human-made

Show HN: Lowfat – pluggable CLI filter that saved 91.8% of my LLM tokens

Entanglement Builds Space-Time. Now "Magic" Gives It Gravity

Astronauts told to return to ISS after sheltering over air leak repairs

pg_durable: Microsoft open sources in-database durable execution

Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency

Three of our worst VC stories

My Agent Skill for Test-Driven Development

New method turns ocean water into drinking water, without waste

Mouseless – keyboard-driven control of macOS/Linux/Windows

Conventional Commits encourages focus on the wrong things

Transformers Are Inherently Succinct

Gov.uk has replaced Stripe with Dutch provider Adyen

Accidentally deleted subscriptions for chat integrations (Slack and MS Teams)

I tested every IP KVM in my Homelab

Did Claude increase bugs in rsync?

Do the Hardest Thing

"Maybe later" was a feature

Cooldown Support for Ruby Bundler

Launch HN: General Instinct (YC P26) – Frontier models on edge devices

Ask HN: What was your "oh shit" moment with GenAI?

Mantine-datatable (and others) compromised – owner account suspended

Inside FAISS: Billion-Scale Similarity Search

Tracing a powerful GNSS interference source over Europe

Google to pay SpaceX $920M a month for compute capacity at xAI data centers

Nango (YC W23, dev infra) is hiring staff back end engineers

India's surprise baby bust

Redis 8.8: New array data structure, rate limiter, performance improvements

Dutch gov't will only allow European company to operate DigiD platform

C++: The Documentary

Let's celebrate work that is 100% human-made

Show HN: Lowfat – pluggable CLI filter that saved 91.8% of my LLM tokens

Entanglement Builds Space-Time. Now "Magic" Gives It Gravity

Ask HN: What was your "oh shit" moment with GenAI?

Comments