frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Binfmtc – binfmt_misc C scripting interface

https://www.netfort.gr.jp/~dancer/software/binfmtc.html.en
36•todsacerdoti•3h ago•9 comments

Gaussian Integration Is Cool

https://rohangautam.github.io/blog/chebyshev_gauss/
70•beansbeansbeans•7h ago•11 comments

The last six months in LLMs, illustrated by pelicans on bicycles

https://simonwillison.net/2025/Jun/6/six-months-in-llms/
409•swyx•8h ago•117 comments

Joining Apple Computer (2018)

https://www.folklore.org/Joining_Apple_Computer.html
334•tosh•19h ago•84 comments

<Blink> and <Marquee> (2020)

https://danq.me/2020/11/11/blink-and-marquee/
143•ghssds•12h ago•129 comments

Ask HN: How to learn CUDA to professional level

104•upmind•5h ago•41 comments

Bill Atkinson has died

https://daringfireball.net/linked/2025/06/07/bill-atkinson-rip
1462•romanhn•1d ago•248 comments

Why not use DNS over HTTPS (DoH)?

https://www.bsdhowto.ch/doh.html
117•Bogdanp•7h ago•145 comments

Self-Host and Tech Independence: The Joy of Building Your Own

https://www.ssp.sh/blog/self-host-self-independence/
330•articsputnik•22h ago•160 comments

Convert photos to Atkinson dithering

https://gazs.github.io/canvas-atkinson-dither/
378•nvahalik•19h ago•41 comments

My experiment living in a tent in Hong Kong's jungle

https://corentin.trebaol.com/Blog/8.+The+Homelessness+Experiment
397•5mv2•23h ago•169 comments

Focus and Context and LLMs

https://taras.glek.net/posts/focus-and-context-and-llms/
27•tarasglek•7h ago•9 comments

Knowledge Management in the Age of AI

https://ericgardner.info/notes/knowledge-management-june-2025
65•katabasis•12h ago•33 comments

Coventry Very Light Rail

https://www.coventry.gov.uk/coventry-light-rail
142•Kaibeezy•18h ago•197 comments

Fray: A Controlled Concurrency Testing Framework for the JVM

https://github.com/cmu-pasta/fray
49•0x54MUR41•9h ago•2 comments

BorgBackup 2 has no server-side append-only anymore

https://github.com/borgbackup/borg/pull/8798
167•jaegerma•21h ago•98 comments

Field Notes from Shipping Real Code with Claude

https://diwank.space/field-notes-from-shipping-real-code-with-claude
164•diwank•22h ago•57 comments

Researchers develop ‘transparent paper’ as alternative to plastics

https://japannews.yomiuri.co.jp/science-nature/technology/20250605-259501/
419•anigbrowl•1d ago•266 comments

What was Radiant AI, anyway?

https://blog.paavo.me/radiant-ai/
196•paavohtl•1d ago•108 comments

Why We're Moving on from Nix

https://blog.railway.com/p/introducing-railpack
247•mooreds•1d ago•111 comments

A look at Cloudflare's AI-coded OAuth library

https://neilmadden.blog/2025/06/06/a-look-at-cloudflares-ai-coded-oauth-library/
201•itsadok•7h ago•123 comments

Low-Level Optimization with Zig

https://alloc.dev/2025/06/07/zig_optimization
278•Retro_Dev•1d ago•170 comments

Getting Past Procrastination

https://spectrum.ieee.org/getting-past-procastination
386•WaitWaitWha•1d ago•158 comments

How we decreased GitLab repo backup times from 48 hours to 41 minutes

https://about.gitlab.com/blog/2025/06/05/how-we-decreased-gitlab-repo-backup-times-from-48-hours-to-41-minutes/
558•immortaljoe•2d ago•229 comments

A tool for burning visible pictures on a compact disc surface (2022)

https://github.com/arduinocelentano/cdimage
175•carlesfe•1d ago•50 comments

Discovering a JDK Race Condition, and Debugging It in 30 Minutes with Fray

https://aoli.al/blogs/jdk-bug/
123•aoli-al•21h ago•23 comments

Why Understanding Software Cycle Time Is Messy, Not Magic

https://arxiv.org/abs/2503.05040
56•SiempreViernes•19h ago•15 comments

I read all of Cloudflare's Claude-generated commits

https://www.maxemitchell.com/writings/i-read-all-of-cloudflares-claude-generated-commits/
274•maxemitchell•1d ago•240 comments

Washington Post's Privacy Tip: Stop Using Chrome, Delete Meta Apps (and Yandex)

https://tech.slashdot.org/story/25/06/07/035249/washington-posts-privacy-tip-stop-using-chrome-delete-metas-apps-and-yandex
422•miles•23h ago•270 comments

The time bomb in the tax code that's fueling mass tech layoffs

https://qz.com/tech-layoffs-tax-code-trump-section-174-microsoft-meta-1851783502
1415•booleanbetrayal•4d ago•882 comments
Open in hackernews

The last six months in LLMs, illustrated by pelicans on bicycles

https://simonwillison.net/2025/Jun/6/six-months-in-llms/
400•swyx•8h ago

Comments

neepi•6h ago
My only take home is they are all terrible and I should hire a professional.
dist-epoch•5h ago
Most of them are text-only models. Like asking a person born blind to draw a pelican, based on what they heard it looks like.
neepi•5h ago
That seems to be a completely inappropriate use case?

I would not hire a blind artist or a deaf musician.

dist-epoch•5h ago
The point is about exploring the capabilities of the model.

Like asking you to draw a 2D projection of 4D sphere intersected with a 4D torus or something.

kevindamm•4h ago
Yeah, I suppose it is similar.. I don't know their diameters, rotations, nor the distance between their centers, nor which two dimensions, so I would have to guess a lot about what you meant.
namibj•5h ago
It's a proxy for abstract designing, like writing software or designing in a parametric CAD.

Most the non-math design work of applied engineering AFAIK falls under the umbrella that's tested with the pelican riding the bicycle. You have to make a mental model and then turn it into applicable instructions.

Program code/SVG markup/parametric CAD instructions don't really differ in that aspect.

neepi•4h ago
I would not assume that this methodology applies to applied engineering, as a former actual real tangible meat space engineer. Things are a little nuanced and the nuances come from a combination of communication and experience, neither of which any LLM has any insight into at all. It's not out there on the internet to train it with and it's not even easy to put it into abstract terms which can be used as training data. And engineering itself in isolation doesn't exist - there is a whole world around it.

Ergo no you can't just say throw a bicycle into an LLM and a parametric model drops out into solidworks, then a machine makes it. And everyone buys it. That is the hope really isn't it? You end up with a useless shitty bike with a shit pelican on it.

The biggest problem we have in the LLM space is the fact that no one really knows any of the proposed use cases enough and neither does anyone being told that it works for the use cases.

dist-epoch•4h ago
https://www.solidworks.com/lp/evolve-your-design-workflows-a...
neepi•3h ago
Yeah good luck with that. Seriously.
rjsw•4h ago
I don't think any of that matters, CEOs will decide to use it anyway.
neepi•3h ago
This is sad but true.
__alexs•5h ago
I guess the idea is that by asking the model to do something that is inherently hard for it we might learn something about the baseline smartness of each model which could be considered a predictor for performance at other tasks too.
dmd•5h ago
Sorry, Beethoven, you just don’t seem to be a match for our org. Best of luck on your search!

You too, Monet. Scram.

simonw•4h ago
Yeah, that's part of the point of this. Getting a state of the art text generating LLM to generate SVG illustrations is an inappropriate application of them.

It's a fun way to deflate the hype. Sure, your new LLM may have cost XX million to train and beat all the others on the benchmarks, but when you ask it to draw a pelican on a bicycle it still outputs total junk.

dist-epoch•3h ago
tried starting from an image:

https://chatgpt.com/share/684582a0-03cc-8006-b5b5-de51e5cd89...

lol: https://gemini.google.com/share/4d1746a234a8

wongogue•3h ago
Even Beethoven?
matkoniecz•5h ago
it depends on quality you need and your budget
neepi•5h ago
Ah yes the race to the bottom argument.
ben_w•4h ago
When I was at university, they got some people from industry to talk to us all about our CVs and how to do interviews.

My CV had a stupid cliché, "committed to quality", which they correctly picked up on — "What do you mean?" one of them asked me, directly.

I thought this meant I was focussed on being the best. He didn't like this answer.

His example, blurred by 20 years of my imperfect human memory, was to ask me which is better: a Porsche, or a go-kart. Now, obviously (or I wouldn't be saying this), Porsche was a trick answer. Less obviously is that both were trick answers, because their point was that the question was under-specified — quality is the match between the product and what the user actually wants, so if the user is a 10 year old who physically isn't big enough to sit in a real car's driver's seat and just wants to rush down a hill or along a track, none of "quality" stuff that makes a Porsche a Porsche is of any relevance at all, but what does matter is the stuff that makes a go-kart into a go-kart… one of which is the affordability.

LLMs are go-karts of the mind. Sometimes that's all you need.

neepi•3h ago
I disagree. Quality depends on your market position and what you are bringing to the market. Thus I would start with market conditions and work back to quality. If you can't reach your standards in the market then you shouldn't enter it. And if your standards are poor, you should be ashamed.

Go kart or porsche is irrelevant.

ben_w•3h ago
> Quality depends on your market position and what you are bringing to the market.

That's the point.

The market for go-karts does not support Porche.

If you bring a Porche sales team to a go-kart race, nobody will be interested.

Porche doesn't care about this market. It goes both ways: this market doesn't care about Porche, either.

keiferski•5h ago
As the other guy said, these are text models. If you want to make images use something like Midjourney.

Promoting a pelican riding a bicycle makes a decent image there.

keiferski•2h ago
* Prompting
GaggiX•3h ago
An expert at writing SVGs?
jug•2h ago
Before that, you might ask ChatGPT to create a vector image of a pelican riding a bicycle and then running the output through a PNG to SVG converter...

Result: https://www.dropbox.com/scl/fi/8b03yu5v58w0o5he1zayh/pelican...

These are tough benchmarks to trial reasoning by having it _write_ an SVG file by hand and understanding how it's to be written to achieve this. Even a professional would struggle with that! It's _not_ a benchmark to give an AI the best tools to actually do this.

spaceman_2020•2h ago
My only take home is that a spanner can work as a hammer, but you probably should just get a hammer
joshstrange•5h ago
I really enjoy Simon’s work in this space. I’ve read almost every blog post they’ve posted on this and I love seeing them poke and prod the models to see what pops out. The CLI tools are all very easy to use and complement each other nicely all without trying to do too much by themselves.

And at the end of the day, it’s just so much fun to see someone else having so much fun. He’s like a kid in a candy store and that excitement is contagious. After reading every one of his blog posts, I’m inspired to go play with LLMs in some new and interesting way.

Thank you Simon!

blackhaj7•2h ago
Same sentiment!
nathan_phoenix•5h ago
My biggest gripe is that he's comparing probabilistic models (LLMs) by a single sample.

You wouldn't compare different random number generators by taking one sample from each and then concluding that generator 5 generates the highest numbers...

Would be nicer to run the comparison with 10 images (or more) for each LLM and then average.

puttycat•4h ago
You are right, but the companies making these models invest a lot of effort in marketing them as anything but probabilistic, i.e. making people think that these models work discretely like humans.

In that case we'd expect a human with perfect drawing skills and perfect knowledge about bikes and birds to output such a simple drawing correctly 100% of the time.

In any case, even if a model is probabilistic, if it had correctly learned the relevant knowledge you'd expect the output to be perfect because it would serve to lower the model's loss. These outputs clearly indicate flawed knowledge.

ben_w•4h ago
> In that case we'd expect a human with perfect drawing skills and perfect knowledge about bikes and birds to output such a simple drawing correctly 100% of the time.

Look upon these works, ye mighty, and despair: https://www.gianlucagimini.it/portfolio-item/velocipedia/

jodrellblank•2h ago
You claim those are drawn by people with "perfect knowledge about bikes" and "perfect drawing skills"?
ben_w•1h ago
More that "these models work … like humans" (discretely or otherwise) does not imply the quotation.

Most humans do not have perfect drawing skills and perfect knowledge about bikes and birds, they do not output such a simple drawing correctly 100% of the time.

"Average human" is a much lower bar than most people want to believe, mainly because most of us are average on most skills, and also overestimate our own competence — the modal human has just a handful of things they're good at, and one of those is the language they use, another is their day job.

Most of us can't draw, and demonstrably can't remember (or figure out from first principles) how a bike works. But this also applies to "smart" subsets of the population: physicists have https://xkcd.com/793/, and there's this famous rocket scientist who weighed in on rescuing kids from a flooded cave, they come up with some nonsense about a submarine.

Retric•24m ago
It’s not that humans have perfect drawing skills, it’s that humans can judge their performance and get better over time.

Ask 100 random people to draw a bike and in 10 minutes and they’ll on average suck while still beating the LLM’s here. Give em an incentive and 10 months and the average person is going to be able to make at least one quite decent drawing of a bike.

The cost and speed advantage of LLM’s is real as long as you’re fine with extremely low quality. Ask a model for 10,000 drawings so you can pick the best and you get a marginal improvements based on random chance at a steep price.

rightbyte•1h ago
That blog post is a 10/10. Oh dear I miss the old internet.
cyanydeez•4h ago
Humans absolutely do not work discretely.
loloquwowndueo•3h ago
They probably meant deterministically as opposed to probabilistically. Which also humans dont work like that :)
aspenmayer•2h ago
I thought they meant discreetly.
planb•4h ago
And by a sample that has become increasingly known as a benchmark. Newer training data will contain more articles like this one, which naturally improves the capabilities of an LLM to estimate what’s considered a good „pelican on a bike“.
cyanydeez•4h ago
So what you really need to do is clone this blog post, find and replace pelican with any other noun, run all the tests, and publish that.

Call it wikipediaslop.org

criddell•3h ago
And that’s why he says he’s going to have to find a new benchmark.
viraptor•2h ago
Would it though? There really aren't that many valid answers to that question online. When this is talked about, we get more broken samples than reasonable ones. I feel like any talk about this actually sabotages future training a bit.

I actually don't think I've seen a single correct svg drawing for that prompt.

simonw•4h ago
It might not be 100% clear from the writing but this benchmark is mainly intended as a joke - I built a talk around it because it's a great way to make the last six months of model releases a lot more entertaining.

I've been considering an expanded version of this where each model outputs ten images, then a vision model helps pick the "best" of those to represent that model in a further competition with other models.

(Then I would also expand the judging panel to three vision LLMs from different model families which vote on each round... partly because it will be interesting to track cases where the judges disagree.)

I'm not sure if it's worth me doing that though since the whole "benchmark" is pretty silly. I'm on the fence.

ontouchstart•3h ago
Very nice talk, acceptable by general public and by AI agent as well.

Any concerns about open source “AI celebrity talks” like yours can be used in contexts that would allow LLM models to optimize their market share in ways that we can’t imagine yet?

Your talk might influence the funding of AI startups.

#butterflyEffect

threecheese•2h ago
I welcome a VC funded pelican … anything! Clippy 2.0 maybe?

Simon, hope you are comfortable in your new role of AI Celebrity.

demosthanos•3h ago
I'd say definitely do not do that. That would make the benchmark look more serious while still being problematic for knowledge cutoff reasons. Your prompt has become popular even outside your blog, so the odds of some SVG pelicans on bicycles making it into the training data have been going up and up.

Karpathy used it as an example in a recent interview: https://www.msn.com/en-in/health/other/ai-expert-asks-grok-3...

diggan•2h ago
Yeah, this is the problem with benchmarks where the questions/problems are public. They're valuable for some months, until it bleeds into the training set. I'm certain a lot of the "improvements" we're seeing are just benchmarks leaking into the training set.
travisgriggs•1h ago
That’s ok, once bicycle “riding” pelicans become normative, we can ask it for images of pelicans humping bicycles.

The number of subject-verb-objects are near infinite. All are imaginable, but most are not plausible. A plausibility machine (LLM) will struggle with the implausible, until it can abstract well.

diggan•27m ago
> The number of subject-verb-objects are near infinite. All are imaginable, but most are not plausible

Until there is enough unique/new subject-verb-objects examples/benchmarks so the trained model actually generalized it just like you did. (Public) Benchmarks needs to constantly evolve, otherwise they stop being useful.

6LLvveMx2koXfwn•2h ago
I would definitely say he had no intention of doing that and was doubling down on the original joke.
colecut•1h ago
The road to hell is paved with the best intentions

clarification: I enjoyed the pelican on a bike and don't think it's that bad =p

throwaway31131•20m ago
I’d say it doesn’t really matter. There is no universally good benchmark and really they should only be used to answer very specific questions which may or may not be relevant to you.

Also, as the old saying goes, the only thing worse than using benchmarks is not using benchmarks.

fzzzy•3h ago
Even if it is a joke, having a consistent methodology is useful. I did it for about a year with my own private benchmark of reasoning type questions that I always applied to each new open model that came out. Run it once and you get a random sample of performance. Got unlucky, or got lucky? So what. That's the experimental protocol. Running things a bunch of times and cherry picking the best ones adds human bias, and complicates the steps.
simonw•3h ago
It wasn't until I put these slides together that I realized quite how well my joke benchmark correlates with actual model performance - the "better" models genuinely do appear to draw better pelicans and I don't really understand why!
pama•3h ago
How did the pelicans of point releases of V3 and of R1 (R1-0528) do compared to the original versions of the models?
MichaelZuo•2h ago
I imagine the straightforward reason is that the “better” models are in fact significantly smarter in some tangible way, somehow.
more-nitor•2h ago
I just don't get the fuss from the pro-LLM people who don't want anyone to shame their LLMs...

people expect LLMs to say "correct" stuff on the first attempt, not 10000 attempts.

Yet, these people are perfectly OK with cherry-picked success stories on youtube + advertisements, while being extremely vehement about this simple experiment...

...well maybe these people rode the LLM hype-train too early, and are desperate to defend LLMs lest their investment go poof?

obligatory hype-graph classic: https://upload.wikimedia.org/wikipedia/commons/thumb/9/94/Ga...

tuananh•2h ago
until they start targeting this benchmark
simonw•2h ago
Right, that was the closing joke for the talk.
johnrob•40m ago
Well, the most likely single random sample would be a “representative” one :)
dilap•1h ago
Joke or not, it still correlates much better with my own subjective experiences of the models than LM Arena!
cyanydeez•4h ago
I get my pelicans from google and my raw dogs from openAI, while the best fundamental fascist ideologies are best sourced from GrokAI.
qeternity•3h ago
I think you mean non-deterministic, instead of probabilistic.

And there is no reason that these models need to be non-deterministic.

skybrian•2h ago
A deterministic algorithm can still be unpredictable in a sense. In the extreme case, a procedural generator (like in Minecraft) is deterministic given a seed, but you will still have trouble predicting what you get if you change the seed, because internally it uses a (pseudo-)random number generator.

So there’s still the question of how controllable the LLM really is. If you change a prompt slightly, how unpredictable is the change? That can’t be tested with one prompt.

rvz•2h ago
> I think you mean non-deterministic, instead of probabilistic.

My thoughts too. It's more accurate to label LLMs as non-deterministic instead of "probablistic".

mooreds•2h ago
My biggest gripe is that he outsourced evaluation of the pelicans to another LLM.

I get it was way easier to do and that doing it took pennies and no time. But I would have loved it if he'd tried alternate methods of judging and seen what the results were.

Other ways:

* wisdom of the crowds (have people vote on it)

* wisdom of the experts (send the pelican images to a few dozen artists or ornithologists)

* wisdom of the LLMs (use more than one LLM)

Would have been neat to see what the human consensus was and if it differed from the LLM consensus

Anyway, great talk!

anon373839•5h ago
Enjoyable write-up, but why is Qwen 3 conspicuously absent? It was a really strong release, especially the fine-grained MoE which is unlike anything that’s come before (in terms of capability and speed on consumer hardware).
Maxious•4h ago
Cut for time - qwen3 was pelican tested too https://simonwillison.net/2025/Apr/29/qwen-3/
simonw•4h ago
Omitting Qwen 3 is my great regret about this talk. Honestly I only realized I had missed it after I had delivered the talk!

It's one of my favorite local models right now, I'm not sure how I missed it when I was reviewing my highlights of the last six months.

qwertytyyuu•4h ago
https://imgur.com/a/mzZ77xI here are a few i tried the models, looks like the newer vesion of gemini is another improvement?
puttycat•4h ago
The bicycle are still very far from actual ones.
simonw•4h ago
I think the most recent Gemini Pro bicycle may be the best yet - the red frame is genuinely the right shape.
layer8•3h ago
The pelican, on the other hand...
pjs_•21m ago
https://www.gianlucagimini.it/portfolio-item/velocipedia/
JimDabell•4h ago
See also: The recent history of AI in 32 otters

https://www.oneusefulthing.org/p/the-recent-history-of-ai-in...

pbhjpbhj•4h ago
That is otterly fantastic. The post there shows the breadth too - both otters generated via text representations (in TikZ) and by image generators. The video at the end, wow (and funny too).

Thanks for sharing.

bravesoul2•4h ago
Is there a good model (any architecture) for vector graphics out of interest?
simonw•4h ago
I was impressed by Recraft v3, which gave me an editable vector illustration with different layers - https://simonwillison.net/2024/Nov/15/recraft-v3/ - but as I understand it that one is actually still a raster image generator with a separate step to convert to vector at the end.
bravesoul2•4h ago
Now that is a pelican on a bicycle! Thanks
dirtyhippiefree•4h ago
Here’s the spot where we see who’s TL;DR…

> Claude 4 will rat you out to the feds!

>If you expose it to evidence of malfeasance in your company, and you tell it it should act ethically, and you give it the ability to send email, it’ll rat you out.

ben_w•4h ago
I'd say that's too short.

> But it’s not just Claude. Theo Browne put together a new benchmark called SnitchBench, inspired by the Claude 4 System Card.

> It turns out nearly all of the models do the same thing.

dirtyhippiefree•3h ago
I totally agree, but I needed you to post the other half because of TL;DR…
yubblegum•3h ago
I was looking at that and wondering about swatting via LLMs by malicious users.
atxtechbro•3h ago
Thank you, Simon! I really enjoyed your PyBay 2023 talk on embeddings and this is great too! I like the personalized benchmark. Hopefully the big LLM providers don't start gaming the pelican index!
deadbabe•3h ago
As a control, he should go on fiver and have a human generate a pelican riding a bicycle, just to see what the eventual goal is.
gus_massa•2h ago
Someone did this. Look at this sibling comment by ben_w https://news.ycombinator.com/item?id=44216284 about an old similar project.
franze•3h ago
Here Claude Opus Extended Thinking https://claude.ai/public/artifacts/707c2459-05a1-4a32-b393-c...
ramesh31•2h ago
Single shot?
franze•1h ago
2 shot, first one did just generate the svg not the shareable html page around it. in the second go it also worked on the svg as i did not forbid it.
big_hacker•3h ago
Honestly the metric which increased the most is the marketing and astroturfing budget of the major players (OpenAI, Anthropic, Google and Deepseek).

Say what you want about Facebook but at least they released their flagship model fully open.

spaceman_2020•2h ago
I don’t know what secret sauce Anthropic has, but in real world use, Sonnet is somehow still the best model around. Better than Opus and Gemini Pro
diggan•2h ago
Statements like these are useless without sharing exactly all the models you've tried. Sonnet beats O1 Pro Mode for example? Not in my experience, but I haven't tried the latest Sonnet versions, only the one before, so wouldn't claim O1 Pro Mode beats everything out there.

Besides, it's so heavily context-dependent that you really need your own private benchmarks to make head or tails out of this whole thing.

wohoef•2h ago
Quite a detailed image using claude sonnet 4: https://ibb.co/39RbRm5W
landgenoot•2h ago
If you would give a human the SVG documentation and ask to write an SVG, I think the results would be quite similar.
diggan•2h ago
Lets give it a try, if you're willing to be the experiment subject :)

The prompt is "Generate an SVG of a pelican riding a bicycle" and you're supposed to write it by hand, so no graphical editor. The specification is here: https://www.w3.org/TR/SVG2/

I'm fairly certain I'd lose interest in getting it right before I got something better than most of those.

mormegil•11m ago
Did the testing prompt for LLMs include a clause forbidding the use of any tools? If not, why are you adding it here?
diggan•8m ago
The models that are being put under the "Pelican" testing don't use a GUI to create SVGs (either via "tools" or anything else), they're all Text Generation models so they exclusively use text for creating the graphics.

There are 31 posts listed under "pelican-riding-a-bicycle" in case you wanna inspect the methodology even closer: https://simonwillison.net/tags/pelican-riding-a-bicycle/

simonw•6m ago
The way I run the pelican on a bicycle benchmark is to use this exact prompt:

  Generate an SVG of a pelican riding a bicycle
And execute it via the model's API with all default settings, not via their user-facing interface.

Currently none of the model APIs enable tools by designers you ask them to, so this method excludes the use of additional tools.

ramesh31•2h ago
>If you would give a human the SVG documentation and ask to write an SVG, I think the results would be quite similar.

It certainly would, and it would cost at minimum an hour of the human programmer's time at $50+/hr. Claude does it in seconds for pennies.

adrian17•2h ago
> This was one of the most successful product launches of all time. They signed up 100 million new user accounts in a week! They had a single hour where they signed up a million new accounts, as this thing kept on going viral again and again and again.

Awkwardly, I never heard of it until now. I was aware that at some point they added ability to generate images to the app, but I never realized it was a major thing (plus I already had an offline stable diffusion app on my phone, so it felt less of an upgrade to me personally). With so much AI news each week, feels like unless you're really invested in the space, it's almost impossible to not accidentally miss or dismiss some big release.

azinman2•2h ago
Except this went very mainstream. Lots of turn myself into a muppet, what is the human equivalent for my dog, etc. TikTok is all over this.

It really is incredible.

thierrydamiba•1h ago
The big trend was around the ghiblification of images. Those images were everywhere for a period of time.
Jedd•1h ago
Yeah, but so were the bored ape NFTs - none of these ephemeral fads are any indication of quality, longevity, legitimacy, or interest.
mrkurt•40m ago
If we try really hard, I think we can make an exhaustive list of what viral fads on the internet are not. You made a small start.

none of these ephemeral fads are any indication of quality, longevity, legitimacy, interest, substance, endurance, prestige, relevance, credibility, allure, staying-power, refinement, or depth.

baq•17m ago
It’s hard to think of a worse analogy TBH. My wife is using ChatGPT to change photos (still is to this day), she didn’t use it or any other LLM until that feature hit. It is a fad, but it’s also a very useful tool.

Ape NFTs are… ape NFTs. Useless. Pointless. Negative value for most people.

herval•38m ago
They still are. Instagram is full of accounts posting gpt-generated cartoons (and now veo3 videos). I’ve been tracking the image generation space from day one, and it never stuck like this before
simonw•32m ago
Anecdotally, I've had several conversations with people way outside the hyper-online demographic who have been really enjoying the new ChatGPT image generation - using it for cartoon photos of their kids, to create custom birthday cards etc.

I think it's broken out into mainstream adoption and is going to stay there.

It reminds me a little of Napster. The Napster UI was terrible, but it let people do something they had never been able to do before: listen to any piece of music ever released, on-demand. As a result people with almost no interest in technology at all were learning how to use it.

Most people have never had the ability to turn a photo of their kids into a cute cartoon before, and it turns out that's something they really want to be able to do.

herval•27m ago
Definitely. It’s not just online either - half the billboards I see now are AI. The posters at school. The “we’re hiring!” ad at the local McDonalds. It’s 100x cheaper and faster than any alternative (stock images, hiring an editor or illustrator, etc), and most non technical people can get exactly what they want in a single shot, these days.
haiku2077•25m ago
Congratulations, you are almost fully unplugged from social media. This product launch was a huge mainstream event; for a few days GPT generated images completely dominated mainstream social media.
nowayno583•2h ago
That was a very fun recap, thanks for sharing. It's easy to forget how much better these things have gotten. And this was in just six months! Crazy!
mromanuk•2h ago
The last animation is hilarious, represents very well the AI Hype cycle vs reality.
bredren•2h ago
Great writeup.

This measure of LLM capability could be extended by taking it into the 3D domain.

That is, having the model write Python code for Blender, then running blender in headless mode behind an API.

The talk hints at this but one shot prompting likely won’t be a broad enough measurement of capability by this time next year. (Or perhaps now, even)

So the test could also include an agentic portion that includes consultation of the latest blender documentation or even use of a search engine for blog entries detailing syntax and technique.

For multimodal input processing, it could take into account a particular photo of a pelican as the test subject.

For usability, the objects can be converted to iOS’s native 3d format that can be viewed in mobile safari.

I built this workflow, including a service for blender as an initial test of what was possible in October of 2022. It took post processing for common syntax errors back then but id imagine the newer LLMs would make those mistakes less often now.

jfengel•1h ago
It's not so great at bicycles, either. None of those are close to rideable.

But bicycles are famously hard for artists as well. Cyclists can identify all of the parts, but if you don't ride a lot it can be surprisingly difficult to get all of the major bits of geometry right.

pier25•58m ago
Definitely getting better but even the best result is not very impressive.
nine_k•30m ago
Am I the only one who can't but see these attempts much like attempts of a kid learning to draw?
Ygg2•10m ago
Yes. Kids don't draw that good of a line at the start.

Here is better example of start https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTfTfAA...

djherbis•9m ago
Kaggle recently ran a competition to do just this (draw SVGs from prompts, using fairly small models under the hood).

The top results (click on the top Solutions) were pretty impressive: https://www.kaggle.com/competitions/drawing-with-llms/leader...