In about two years we've gone from "AI just generates rubbish where the text should be" to "AI spells things pretty wrong." This is largely down to generating a whole image with a textual element. Using a model like SDXL with a LORA like FOOOCUS to do inpainting and input image with a very rough approximation of the right text (added via MS Paint) you can get a pretty much perfect result. Give it another couple of years and the text generation will be spot on.
So yes, right now we need a human to either use the AI well, or to fix it afterwards. That's how technology always goes - something is invented, it's not perfect, humans need to fix the outputs, but eventually the human input diminishes to nothing.
This is not how AI has ever gone. Every approach so far has either been a total dead end, or the underlying concept got pivoted into a simplified, not-AI tech.
This new approach of machine learning content generation will either keep developing, or it will join everything else in the history of AI by hitting a point of diminishing to zero returns.
I agree we probably won't magically scale current techniques to AGI, but I also think the local maxima for creative output is going to be high enough that it changes how we approach it the way computers changed how we approach knowledge work.
That's why I focus on it at least.
You're talking about the progress of technology. I'm talking about how humans use technology in it's early states. They're not mutually exclusive.
And most SOTA models (Imagen, Qwen 20b, etc) at this point can actually already handle a fair amount of text in a single T2I generation. Flux Dev provided your willing to roll a couple gens can do it as well.
AI (at least this form of AI) is not going to take our jobs away and let us all idle and poor, just like the milling machine or the plough didn't take people's jobs away and make everyone poor. it will enable us to do even greater things.
The plough didn't make everyone poor, but people working in agriculture these days are a tiny percentage of the population compared to the majority 150 years ago.
(I don't think LLMs are like that, tho).
Touching on this topic, I cannot recommend enough "The Box: How the Shipping Container Made the World Smaller and the World Economy Bigger" which (among other things) illustrates the story of dockworkers: there were entire towns dedicated to loading ships.
But the people employed in that area have declined by 90% in the last 60 years, while shipping has grown by orders of magnitude. New port cities arose, and old ones died. One needs to accept inevitable change sometimes.
Sometimes demand scales, maybe food is less elastic. Programming has been automating itself with each new language and library for 70 years and here we are, so many software devs. Demand scaled up as a result of automation.
Just as gun powder enabled greater things. I agree with you just humans have shown, time after time, an ability to first use innovation to make lives miserable for their fellow humans.
It doesnt do this.
This is highly dependent on which model is being used and what hardware it's running on. In particular, some older article claimed that the energy used to generate an image was equivalent to charging a mobile phone, but the actual energy required for a single image generation (SDXL, 25 steps) is about 35 seconds of running a 80W GPU.
I’m not sure it will be as high as a full charge of a phone, but it’s incomplete without the resources needed for collecting data and training the model.
Am I missing something? Does the CPU/GPU/APU doing this calculation on servers/PCs run the same wattage as mobile devices?
The proper unit is watt hours.
Their napkin math went like, human artists take $50 or so per art, which is let's say $200/hr skill, which means each art cannot take longer than 10 minutes, therefore the values used for AI must add up to less than 10 workstation minutes, or something like that.
And that math is equally broken for both sides: SDXL users easily spend hours rolling dice a hundred times without usable images, and likewise, artists just easily spend a day or two for an interesting request that may or may not come with free chocolates.
So those estimates are not only biased, but basically entirely useless.
If you compare datacenter energy usage to the rest, it amounts to 5%. Making great economies on LLMs won't save the planet.
This can't be correct, I'd like to see how this was measured.
Running a GPU at full throttle for one hour uses less power than serving data for one hour?
I'm very sceptical.
[1] https://www.iea.org/commentaries/the-carbon-footprint-of-str...
[2] https://epoch.ai/gradient-updates/how-much-energy-does-chatg...
The Netflix consumption takes into account everything[1], the numbers for AI are only the GPU power consumption, not including the user's phone/laptop.
IOW, you are comparing the power cost of using a datacenter + global network + 55" TV to the cost of a single 1shot query (i.e. a tiny prompt) on the GPU only
Once again, I am going to say that the power cost of serving up a stored chunk of data is going to be less than the power cost of first running a GPU and then serving up that chunk.
==================
[1] Which (in addition to the consumption by netflix data centers) includes the network equipment in between, the computer/TV on the user's end. Consider that the user is watching netflix on a TV (min 100w, but more for a 60" large screen).
And just how many people manage to 1shot the image?
There are maybe 5 to 20 images generated before the user is happy.
When the big lawsuits hit, they'll roll back.
They'll hire those people back at half their total compensation, with no stock, far fewer benefits, to clean up AI slop. And or just contract it overseas at ~1/3 the former total cost.
Another ten years from now the AI systems will have improved drastically, reducing the slop factor. There's no scenario where it goes back to how it was, that era is over. And the cost will decline substantially versus the peak for US developers.
Because I think it won't just be a linear relationship. If you let 1 vibe coder replace a team of 10, you'll need a lot more than 10 people to clean it up and maintain it going forward when they hit the wall.
Personally I'm looking forward to the news stories about major companies collapsing under the weight of their LLM-induced tech debt.
Why does that fact stop being true when the code is created by AI?
There are really two observations here: 1. AI hasn't commoditized skilled labor. 2. AI is diluting/degrading media culture.
For the first, I'm waiting for more data, e.g. from the BLS. For the second, I think a new category of media has emerged. It lands somewhere near chiptune and deep-fried memes.
The problem is, actually skilled labor - think of translators, designers, copywriters - still is obviously needed, but at an intermediate/senior level. These people won't be replaced for a few years to come, and thus won't show up in labor board statistics.
What is getting replaced (or rather, positions not refilled as the existing people move up the career ladder) is the bottom of the barrel: interns and juniors, because that level of workmanship can actually be done by AI in quite a few cases despite it also being skilled work. But this kind of replacement doesn't show up in any kind of statistics, maybe the number of open positions - but a change in that number can also credibly be attributed to economic uncertainty thanks to tariffs, the Russian invasion, people holding their money together and foregoing spending, yadda yadda.
Obviously this is going to completely wreck the entire media/creative economy in a few years: when the entry side of the career funnel has dried up "thanks" to AI... suddenly there will not be any interns that evolve into juniors, no juniors that evolve into intermediates, no intermediates that evolve into seniors... and all that will be left for many an ad/media agency are a bunch of ghouls in suits that last touched Photoshop one and a half decades ago and sales teams.
LLMs are not AI. Machine learning is more useful. Perhaps they will evolve or perhaps they will prove a dead end.
LLMs are a particular application of machine learning, and as such LLMs both benefit by and contribute to general machine learning techniques.
I agree that LLMs are not the AI we all imagine, but the fact that it broke a huge milestone is a big deal - natural language used to be one of the metrics of AGI!
I believe it is only a matter of time until we get to a multi-sensory self-modifying large models which can both understand and learn from all five of human senses, and maybe even some of the senses we have no access to.
what if we have chosen a wrong metric there?
What does AI do, at its heart? It is literally trained to make things that can pass for what's ordinary. What's the best way to do that, normally? Make a bland thing that is straight down the middle of the road. Boring music, boring pictures, boring writing.
Now there are still some issues with common sense, due to the models lacking certain qualities that I'm sure experts are working on. Things like people with 8 fingers, lack of a model of physics, and so on. But we're already at a place where you could easily not spot a fake, especially while not paying attention.
So where does that leave us? AI is great at producing scaffolding. Lorem Ipsum, but for everything.
Humans come in to add a bit of agency. You have to take some risk when you're producing something, decisions have to be made. Taste, one might call it. And someone needs to be responsible for the decisions. Part of that is cleaning up obvious errors, but also part of it is customizing the skeleton so that it does what you want.
(from Ocean's Eleven)
ares623•3h ago
notachatbot123•2m ago