AI-first – We're just 6 months away from AGI

https://revontulet.dev/p/2025-ai-first/

23•rednafi•1d ago

Comments

29athrowaway•1d ago

No, we are not. Pretrained models do not learn from their prompts. Their state is volatile.

cnity•1d ago

Read the article: it is criticizing the claim.

29athrowaway•1d ago

Me too.

Flemlo•1d ago

We do the same thing.

We have long term memory and short term.

Context is short term.

The still long and expensive training phase embeds the long term memory.

queenkjuul•1d ago

What we do may be analogous but absolutely not the same

Flemlo•1d ago

Yes if course after all ML and LLM are not made out of brain.

But do I really have to say colloquial?

AnimalMuppet•1d ago

Hmm, interesting.

One of the problems programmers have is loading a problem into working memory. It can take an hour. An interruption, a phone call, or a meeting can mean that you have to start over (or, if not completely over, you still have to redo part of it). This is a standard programmer complaint about interruptions.

It's interesting that LLMs may have a similar issue.

29athrowaway•1d ago

We do not do the same thing.

Right now inference doesn't cascade into training.

In biology, inference and training are not so decoupled.

creesch•1d ago

Did you drop that ";-)" from the title because you want to showcase how few people actually read articles posted on HN and just respond to titles. Or did you not read further yourself and didn't even notice it? :P

rednafi•1d ago

Yeah I dropped it thinking HN might not allow it :)

creesch•1d ago

Oef, well that is surely going to influence how people interact (or skip) the article.

throwaway314155•1d ago

Probably got automatically trimmed off by HN's title-fixer thing.

creesch•1d ago

Either way, it removes an important bit of context. Without the it, it seems exactly like one of those over the top hyped claims the article is critical about. It's going to be interesting what sort of comments this will draw in then.

keybored•1d ago

> because you want to showcase how few people actually read articles posted on HN and just respond to titles.

Making a misleading title in order to bait responses would be clickbait. People responding to a title that assertively states something is not some amazing gotcha.

Although in this case it must be what the sibling comment said about HN stripping the smiley.

Kiro•1d ago

I think the article is way more bullish on AI than what the ";-)" and you imply.

creesch•1d ago

The smiley very much implies that the title shouldn't be taken seriously. I'd say it is right on the mark.

To be clear, bullish has a few meanings. But they all boil down to assertively communicating. From your comment I gather you might think it's etymology traces back to "bullying", but it actual relates back to "bull".

Kiro•9h ago

I'm talking about "bullish" as in "optimistic about something's or someone's prospects" so literally the opposite of what you think I meant. In other words, the article is more optimistic about AI than the smiley implies.

FranzFerdiNaN•1d ago

Wonder how many people are only going to read the headline, because it isnt claiming that AGI is coming in 6 months at all.

nottorp•1d ago

I just thought "AGI will be coming in 6 months every month from now on, until the peddlers run out of suckers^H^H^H investors. And of course I skipped the article.

creesch•1d ago

Not saying that you should read the article, but for some context the actual title as ";-)" at the end. Which for some reason is removed here, that should give you a clue that it actually isn't making such claims.

fennecfoxy•1d ago

Revontulet is my favourite myth. The fire fox of the northern lights. There's lots of great stories and animations of the myth, like this one: https://www.youtube.com/watch?v=sN5goxeTfjc

rednafi•1d ago

My favourite kind of bikeshedding :D

mrweasel•1d ago

That's pretty much my take. LLMs aren't a bad idea, they are useful, in certain fields, but they aren't living up to the sales pitch and they are to expensive to run.

My personal take is that the whole chat based interface is a mistake. I have no better solution, but for anything beyond a halluciating search engine, it's not really the way we need to interact with AI.

In my mind we're 6 months away from one of the biggest crashes in tech history.

Flemlo•1d ago

The AI hype also made voice to text and text to voice a lot better.

It's not just chat. But I really enjoy the Claude or chatgpt chat UI.

I'm not sure what your problem with that UI is tbh?

I don't think there will be any crash soon. I'm hoping for lower prices but I can't imagine not having these tools available anymore.

I'm also waiting for better and cheaper support for local development and inferencing.

I'm waiting left and right for a lot of smaller things. Like cheaper Claude, Hardware LLM chips, secure integration of my bank account and other personal data. More capacity for the beta projects of coding agents.

rvnx•1d ago

It would feel like a regression to not have access to such tools, models like Claude are really useful on a day-to-day basis (not only for programming, but for any question you have in life where you would typically use Google and read yourself a lot of information).

We don't need AGI to have increased productivity and benefits.

If it would be _that_ useless, half a billion people wouldn't use such tools every week (and this is not even counting Chinese users).

I think many hate LLMs because they realize that it starts to eat their bread and butter and are already taking their job and find it difficult to accept.

Investors are over-paying, because they are betting on companies that can be outpaced by others, but the technology itself is here to stay and grow, until it can do more and more tasks that are done by humans.

It doesn't mean that LLMs and other AI tools are going to run completely alone anytime soon (maybe with cars first ?), but eventually it could be.

What will be the world in 10 or 20 years from now ?

mrweasel•1d ago

> I'm not sure what your problem with that UI is tbh?

It's not a particularly efficient way to interact with the LLMs. Despite how people hate AI being integrated into everything, I do think that's the only way it can be done effectively. My take is that the AI/LLM isn't the product, it's just a feature of a product. Like when there's "AI" in the camera apps on your phone, or some sort of machine learning in medical imaging. To get the full benefit it needs to be a technology built into your product, not a product in it self.

The voice to text, text to voice is a good example. Voice to text is a feature of a transcription or video conference system. Text to voice is an accessibility feature. It doesn't matter that it's powered by an LLM.

If Claude, ChatGPT, whatever do become cheaper the market will crash. My thinking is: Prices go down (they will eventually), if that happens to fast, OpenAI will tank, their valuation is to high to support a $20 a month AI offering, or worse a world where ChatGPT isn't a product, but a feature. If OpenAI goes down the entire industry will have to re-align to a world where we now know that there's a limit to the value an LLM can command. Most of the current AI companies will be bought up at high discounts (as compared to their current valuations) and people are going to be laid off.

Flemlo•1d ago

The generic chat agent is for me absolutely a product.

The personal assistant agent too.

They might overlap for me but not for everyone.

Nothing will crash. The big rich companies like Google, Microsoft already need it as part of their product strategy.

But about ai companies, I don't think their faith matters at all to end consumer and I'm pretty sure companies like character.ai do not run red numbers.

cm277•1d ago

Agreed. Text is used for a lot of things. A fantastic text parser/generator that doesn't need regex and can extract /meaning/ would have been a sci-fi fever dream even a decade ago. So, LLMs will definitely have their use and will probably disrupt several industries.

But this hype-storm just reminds me of the fever-dream blogs about the brave new world of the Internet back when hypertext became widely used in '93 or so (direct democracy, infinite commerce, etc, etc). Yes, of course, the brave new world came along, but it needed 3G and multi-touch screens as well and that was 15 years later and a whole different set of companies made money and ruled the world than those that bet on hypertext.

Flemlo•1d ago

Don't compare Bitcoin and the Meta verse with ai.

Millions of people use LLMs daily.

Veo 3 just came out.

People get fired due to LLMs and GenAI replacing them.

Google uses ai code for years in the background.

Playing around with cursor for a few days is way to naive to be able to determine the current situation with coding and ai

mrweasel•1d ago

> People get fired due to LLMs and GenAI replacing them.

Serious question: Are they? Or are they being fired and AI is used at the excuse?

We saw Klarna layoff customer services staff and that didn't work. LLMs couldn't do their job. Some programmers are being fired, but my feeling is that generative AI is more of a convenient excuse.

Is anyone actually being fired because an LLM did their job better, or where their job already in danger and AI just gave the companies an easy way out?

rsynnott•1d ago

Yeah, it’s largely this. If a company wants to lay people off, AI magic is a better way to sell that to investors than “we’re not doing enough business to need all these people” or “we’re going to invest less in product development” or whatever. Those are bad things.

dgb23•1d ago

> For a good 12 hours, over the course of 1 1/2 days, I tried to prompt it such that it yields what we needed. Eventually, I noticed that my prompts converged more and more to be almost the code I wanted. After still not getting a working result, I ended up implementing it myself in less than 30 minutes.

This is very common. We need a name for this phenomenon. Any analogies/metaphors?

I generated (huh!) some suggestions and the funniest one seems to be:

> Just One More Prompt Syndrome (JOMPS)

But I assume there's a better one that describes or hints at the irony that describing precisely what a program should be doing _is_ programming.

rvnx•1d ago

Could be that some people are not familiar with the limitations of LLMs and what to expect from them, and could benefit from learning how to properly do prompting, before outright rejecting them.

The LLMs are sometimes frustrating, but over time you learn how to drive them properly, exactly like when you learn how to use the right keywords on Google.

Also, considering he is a free user, the author might be using one the default models ("auto" -> o4-mini or gpt-4o, rarely Claude Sonnet/Opus 3) in Cursor

-> Both of them are rather weak, especially with the automatic context splitting that Cursor does to save money

Claude Code + Opus 4 would likely yield better results here.

rednafi•1d ago

> Also, considering he is a free user, the author might be using one the default models ("auto" -> o4-mini or gpt-4o, rarely Claude Sonnet/Opus 3) in Cursor

I wonder where you got the free user info.

rvnx•1d ago

Possibly, though the context strongly suggests someone discovering Cursor. It's also possible that someone shared their account with him and that could explain the "30-minute thinking time" (the free slow requests).

Chinjut•1d ago

Just use the right prompt. Just use the right model. Just roll the dice correctly.

dinfinity•1d ago

This is too dismissive. There is a massive difference in outputted code quality between models. The recent Google Gemini 2.5 models have dethroned the Anthropic ones. OpenAI o3 is the only other one that is even worth considering; all the rest of the OpenAI models are trash in comparison.

In addition to that we've also seen that the way you prompt (amount of use of expert language), what context you provide through instructions, and tool use make a huge difference on the outcome.

At this point, if your coding experience with LLMs sucks I'd say there is an 80% chance that you're just doing it wrong.

creesch•1d ago

The differences can't be that massive given that the hype already made many of these promises well before these models were ever a thing.

Basically you are just adding to it with "You just need to use the latest model, anything before that is trash". Ignoring that the same was said just a few months prior when those models were cutting edge.

dinfinity•1d ago

> The differences can't be that massive given that the hype already made many of these promises well before these models were ever a thing.

I'm not talking about other people's incorrect promises _and_ I mentioned a number of things in which proper usage today is different from what people were doing before.

> Basically you are just adding to it with "You just need to use the latest model, anything before that is trash".

That's not what I said. Don't put words into my mouth. I said that the older models are trash in comparison, not that they are trash. The older models require more work in prompting to get decent results.

> Ignoring that the same was said just a few months prior when those models were cutting edge.

You are conveniently ignoring the other parts I mentioned to make my claim. Here: "In addition to that we've also seen that the way you prompt (amount of use of expert language), what context you provide through instructions, and tool use make a huge difference on the outcome."

creesch•17h ago

> I'm not talking about other people's incorrect promises _and_ I mentioned a number of things in which proper usage today is different from what people were doing before.

Alright? What you replied to and the context of this entire thread is about promises that have been made for a while now. In fact, we are approaching the point where we can safely talk about years of hype now. For reference, I am using the gpt-4 release as a significant ramp-up in the hype around LLMs.

> That's not what I said. Don't put words into my mouth.

That might not have been your intention. But, again, given the context of where you are replying it does read like. Even with the qualifier of "in comparison".

Like it or not, your comment does add extra qualifiers to the list.

Look, I am not saying that there isn't any progress being made here. I also agree that LLMs can be useful tools as part of a developer toolkit. What I personally don't agree with is that they can do the same job, even less so in real world scenarios. Even the latest models, including Gemini 2.5 and 03 struggle with moderately complex code base. And yes, the argument is always to let them work on small isolated bits of code. Or that if your requirements are tight enough they produce very good code. Which is entirely true, but I also envy the developers who work in structured environments where their code base is that clean and requirements that well-defined.

So, in my experience, coding with these models still sucks. Using them as interactive rubber duckies, replacements for some of the things I used to spend hours googling, debugging small snippets of code, etc. Sure, there they are very useful tools to me. But, to me, that is not coding with LLMs. That is having LLMs available as a tool whenever I need them.

dinfinity•14h ago

> And yes, the argument is always to let them work on small isolated bits of code. Or that if your requirements are tight enough they produce very good code.

You're getting there. The most valuable change is using software (like Cursor) that runs the models in agentic mode so they:

1. find the context themselves (with a basic document with context to point them in the right general direction for the current task).

2. can run commands and specifically tests.

3. iterate on their own output and changes to ensure you don't need to point out what they did wrong manually each time. Just let it find out itself, correct and keep moving until it is a good solution.

Think about how any human developer would approach an issue. For a lot of the 'moderately complex code bases', a developer new to it would also need a lot of pointing in the right direction, a lot of trying stuff out and then correcting themselves. Forget treating LLMs like one-shot magic solution givers, but instead as junior devs that you have to provide with all kinds of things to be successful.

creesch•12h ago

Right, so now we have moved to "well yeah, but you need to use LLM agents". Do you truly not see how you are actually continuing the trend of shifting the goalpost every time someone is critical about the use of LLMs? Critical about parts that just a few short months ago had the same promise you now moved to the use of agents?

Not to mention that with each iteration the amount of tokens needed goes up substantially. Having worked with LLM APIs and their pricing there simply is no way that Cursor is breaking even on that $20 or $40 per month if everyone uses it fully. Not even close. This very much hints at the costs being hidden right now, subsidized if you will, by VC money.

Also, once you have brought junior developers up to speed and guided them they are now slightly more capable developers who can more easily on board on future projects. With LLMs you need to effectively babysit them on each project again.

And there are a lot more caveats, prerequisites and moving targets involved that make the promise and reality for many people and companies not something they can actually be met.

And again, I am not discounting that there are specific areas where people see benefits from using LLMs in agentic from. But those areas are not as ubiquitous as the hype train leads us to believe. And to start using them you need to set up a lot more in the way of infrastructure and due process as well.

RhysU•1d ago

> We need a name for this phenomenon.

There's a classical one: Sunk cost fallacy

Izkata•1d ago

Based on social media's "doomscrolling", how about "doom prompting"?

dgb23•1d ago

I like "doom prompting", it feels exactly like that!

One effective use of these tools is to get past the blank page dread. Let it write _something_ to get going, then turn it off and start working.

Recognizing where this cutoff should be can prevent "doom prompting"!

rednafi•1d ago

Not saying people should read the article, but it's a short one and hides some nuanced points behind an ambiguous title. Probably intentional. I liked how it showed plenty of examples of scaleups getting egg-faced after overpromising AI capabilities. LLMs are useful, but still too costly to run and too error-prone to be let loose on anything mission-critical. I use these tools daily, but that doesn't mean I want AI rammed down my throat at every turn.

A few of my favorite tech authors and OSS contributors have gone full AI-parrot mode lately. Everything they post is either "how they use AI," "how amazing it is," or "how much it sucks." I think a bunch of us have had enough and just want to move on. But try saying anything even slightly critical about AI, or about how the big companies are forcing it into everything in the name of “progress,” and suddenly you're a Luddite.

I'm just glad more folks in the trenches are tired of the noise too and are starting to call it out.

onion2k•1d ago

Google didn’t make books obsolete.

I don't think anyone expected it to make all books obsolete, but you'd struggle to buy an encyclopaedia that isn't for kids these days.

6510•1d ago

I gave copilot a screenshot of my work schedule 3 times. The first time it extracted the data flawlessly and organized it the way I asked. The second time it did everything wrong until I talked it though the process in tiny steps. The 3rd time I've asked it to repeat the same steps for the new schedule and it got it wrong just like the initial second attempt. I then tried to talk it though the process step by step and it got everything wrong!? With each step it messed up things it got right the step before. It eventually asked me to upload the image again.

I suppose I could upload the image to a table to json website and provide copilot with the json but the point was to make things easier. In my mind there is nothing complicated about the structure of a table but if I ask copilot to merely extract the text from a row starting with some text it goes insane. Optical character recognition is an idea from 1870 and they had working implementations 50 years ago. I read a bunch of comments online about models getting progressively worse but my experience was over the span of a few weeks.

Would they be doctoring with the quality to make newer models look even better?

Harlem neighborhood becomes first in US to have trash containerized

Don't Let Apache Iceberg Sink Your Analytics: Practical Limitations in 2025

The Tech Recruitment Ruse That Has Avoided Trump's Crackdown on Immigration

MiSTer FPGA

Is Japan ready to say goodbye to tax-free shopping?

The Redemption of the King's Talmud

Glow (Mac OS theme engine)

Ask HN: Has anybody built search on top of Anna's Archive?

Ask HN: Startup getting spammed with PayPal disputes, what should we do?

How FIDO2 works, a technical deep dive

Claude Code's System Prompt

Retailer Temu's daily US users halve following end of 'de minimis' loophole

What Is Quishing? How Hackers Use QR Codes to Steal Your Data

Your Manager Is Not Your Best Friend

Science-integrity project will root out bad medical papers 'and tell everyone'

Release: QuadParts – FPV Drone Inventory SelfHosted Application

We turned public transit into a multiplayer game

Running FreeDOS inside a Pokémon Emerald save file [video]

Anthropic decided to cut off all of windsurf capacity to all Claude 3.x models

Quality and Taste

Ask HN: Any good productivity tools out there?

Is Strava's "Athlete Intelligence" useful?

Statement on Anthropic Model Availability

Coding Through Chaos: Addiction, Recovery and Acceptance

Windsurf says Anthropic is limiting its direct access to Claude AI models

SkyPlanter – Drone-mounted seedling planting system [video]

I discovered that Bill Gates monopolized ACPI in order to break Linux

Canadian wildfire smoke blankets swath of North America

BuildPad – A platform that helps founders go from idea to successful product

Economics and labor rights in AI skepticism

Harlem neighborhood becomes first in US to have trash containerized

Don't Let Apache Iceberg Sink Your Analytics: Practical Limitations in 2025

The Tech Recruitment Ruse That Has Avoided Trump's Crackdown on Immigration

MiSTer FPGA

Is Japan ready to say goodbye to tax-free shopping?

The Redemption of the King's Talmud

Glow (Mac OS theme engine)

Ask HN: Has anybody built search on top of Anna's Archive?

Ask HN: Startup getting spammed with PayPal disputes, what should we do?

How FIDO2 works, a technical deep dive

Claude Code's System Prompt

Retailer Temu's daily US users halve following end of 'de minimis' loophole

What Is Quishing? How Hackers Use QR Codes to Steal Your Data

Your Manager Is Not Your Best Friend

Science-integrity project will root out bad medical papers 'and tell everyone'

Release: QuadParts – FPV Drone Inventory SelfHosted Application

We turned public transit into a multiplayer game

Running FreeDOS inside a Pokémon Emerald save file [video]

Anthropic decided to cut off all of windsurf capacity to all Claude 3.x models

Quality and Taste

Ask HN: Any good productivity tools out there?

Is Strava's "Athlete Intelligence" useful?

Statement on Anthropic Model Availability

Coding Through Chaos: Addiction, Recovery and Acceptance

Windsurf says Anthropic is limiting its direct access to Claude AI models

SkyPlanter – Drone-mounted seedling planting system [video]

I discovered that Bill Gates monopolized ACPI in order to break Linux

Canadian wildfire smoke blankets swath of North America

BuildPad – A platform that helps founders go from idea to successful product

Economics and labor rights in AI skepticism

AI-first – We're just 6 months away from AGI

Comments