frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Gemini 3 Deep Think drew me a good SVG of a pelican riding a bicycle

https://simonwillison.net/2026/Feb/12/gemini-3-deep-think/
105•stared•2h ago

Comments

rustyhancock•2h ago
The intensity of competition between models is so intense right now they are definitely benchmaxxing pelican on bike SVGs and Will Smith spaghetti dinner videos.
bayindirh•2h ago
So, again, when the indicator becomes a target, it stops being a good indicator.
rcbdev•2h ago
Goodhart's law in action.
kakugawa•2h ago
That's how you know you've made it: when your pet benchmark becomes a target.
JumpCrisscross•1h ago
> when the indicator becomes a target, it stops being a good indicator

But it's still a fair target. Unless it's hard coded into Gemini 3 DT, for which we have no evidence and decent evidence against, I'd say it's still informative.

yieldcrv•2h ago
note that this benchmark aside, they've gotten really good at SVGs, I used to rely on the nounproject for icons, and sometimes various libraries, but now coding agents just synthesize an SVG tag in the code and draw all icons.
stared•1h ago
There was Lenna for digital image compression (https://en.wikipedia.org/wiki/Lenna).

A pelican on a bike is SFW, inclusive, yet cool.

It is not a full benchmark - rather a litmus test.

oplav•1h ago
I never realized Lenna was a Playboy centerfold until years after I first encountered it, which was part of an MP in the data structures class all CS undergrads take at UIUC.

There’s also the foreman for video: https://youtube.com/watch?v=0cdM-7_xUXM

thatguysaguy•1h ago
You can just try other svgs, I got some pretty good ones.

(*Disclaimer: I work for Google, but also I have zero idea about what they trained deepthink on)

bonesss•1h ago
Parallel hypothesis: the intensity of competition between models is so intense that any high-engagement high-relevance web discussion about any LLM/AI generation is gonna hit the self-guided self-reinforced model training and result in de facto benchmaxxing.

Which is only to say: if we HN-front-page it, they will come (generate).

throwaway333444•2h ago
Since it’s a* FAQ… Also that pelican is pretty fly
bstsb•2h ago
read it aloud. “since it’s an FAQ”, where FAQ is pronounced “eff-ay-queue”
rcarmo•2h ago
I don't think this is a good "benchmark" anymore. It's probably on everyone's training set by now.
staticassertion•1h ago
I think it could still be an interesting benchmark. Like, assuming AI companies are genuinely trying to solve this pelican problem, how well do they solve it? That seems valid, and the assumption here is that the approach they take could generalize, which seems plausible.
bulletsvshumans•2h ago
They rigged it.
ranger_danger•1h ago
Source:
aidos•2h ago
The bicycles are getting pretty cyclable now. I’m enjoying this pelican that’s already sliced and ready to bbq.
vessenes•2h ago
Simon notes this benchmark is win-win, since he loves pictures of pelicans riding bicycles — if they spend time benchmaxxing it’s like free pelicans for him.

He originally promised to generate a bunch more animals when we got a “good” pelican. This is not a good pelican. This is an OUTSTANDING pelican, a great bicycle, and it even has a little sun ray over the ocean marked out. I’d like to see more animals please Simon!

alterom•1h ago
> a great bicycle

It's not. Sorry.

Go look at some real bicycles for reference.

losthubble•1h ago
What part is missing? It appears to have all the core parts of a bicycle to me?
romanhn•1h ago
A better comparison would be the monstrosities generated by older models.
sdenton4•1h ago
This is a very reasonable drawing of a bicycle. It has a solid rear triangle, and forward swept front fork, which is an important detail for actually being able to steer the bike. The drivetrain is single speed, but that's fine, and the wheels are radially laced, which is also fine: both of those simplified details are things which occur in real bicycles.
hnuser123456•1h ago
It is visually outstanding. The only thing that sticks out to me is that the steering column bends out forwards towards the ground (negative trail), which would make it oversteer rather than self-stabilize. Interestingly there's a slight positive trail bend in the second one, though.
romanhn•1h ago
Agreed, good is quite an understatement. Every item is drawn superbly, and the basket with the fish is just great. Feels like a big jump over the other models (though granted, this is such a known "benchmark" by now, it's likely gamed to some extent).
manojlds•1h ago
It's funny how I can know where the post is from just by looking at the title (and it's not just about pelicans)
kittbuilds•1h ago
SVG generation is a surprisingly good benchmark for spatial reasoning because it forces the model to work in a coordinate system with no visual feedback loop. You have to hold a mental model of what the output looks like while emitting raw path data and transforms. It's closer to how a blind sculptor works than how an image diffusion model works.

What I find interesting is that Deep Think's chain-of-thought approach helps here — you can actually watch it reason about where the pedals should be relative to the wheels, which is something that trips up models that try to emit the SVG in one shot. The deliberative process maps well to compositional visual tasks.

segmondy•1h ago
For those claiming they rigged it. Do you have any concrete evidence? What if the models have just gotten really good?

I just asked Gemini pro to generate an SVG of an octopus dunking a basketball and it did a great job. Not even Deep Think model. Then I did "generate an svg of raccoon at a beach drinking a beer" you can go try this out yourself. Ask it to generate anything you want in SVG. use your imagination.

Rant: This is why AI is going to take over, folks are not even trying the least.

JumpCrisscross•1h ago
> What if the models have just gotten really good?

Kagi Assistant remains my main way of interacting with AI. One of its benefits is you're encouraged to try different models.

The heterogeneity in competence, particular per unit in time, is growing rapidly. If I'm extrapolating image-creation capabilities from Claude, I'm going to underestimate what Gemini can do without fuckery. Likewise, if I'm using Grok all day, Gemini and Claude will seem unbelievably competent when it comes to deep research.

colecut•1h ago
and it will be folks using AI taking over for at least a while...

Some people try, most people don't.

AI makes doing almost anything easier for the people that do..

Despite the prophesied near-term obliteration of white collar work, I've never felt luckier to work in software.

WarmWash•1h ago
Simon has a private set of SVG tests he uses as well. He said that the private ones were just as impressive.
raincole•59m ago
Every bit of improvement on AI ability will have the corresponding denial phrase. Some people still think AI can't generate the correct number of fingers today.
irthomasthomas•48m ago
Why frame it as rigging? I assume they would teach the models to improve on tasks the public find interesting. Then we just have to come up with more challenges for it.
krackers•4m ago
It's not rigging—it's just RL.
bayindirh•41m ago
> For those claiming they rigged it.

I don't think they "rigged" it, but might be given a bit more push on that part since it's going for a very long time now.

Another benchmark is going on at [0]. It's pretty interesting. A perfect scoring model "borks" in the next iteration, for example.

> Rant: This is why AI is going to take over, folks are not even trying the least.

It might be drawing things alright, at least some cases. I seldom use it when my hours long researches doesn't take me to the place I want, and guess what? AI can't go there, either. It hallucinates things, makes up stuff, etc. For a couple of things I asked, it managed to find a single reference, and it was the thing I was looking for, so it works rarely in my cases.

Rant: This is why people are delusional. They test the happy path and claims it knows all the paths, and then some.

[0]: https://clocks.brianmoore.com/

dw_arthur•37m ago
Everyone should have their own private evals for models. If I ask a question and a model flat out gets it wrong sometimes I will put it in my test questions bank.
bfung•1h ago
In the spirit of Winter Olympics, I vote “Lion on a bobsled” next bench . :)
stephc_int13•1h ago
Many tests are asymmetrical. They can reliably show an issue/abnormality but they are a lot less reliable on the other side of the curve.
Springtime•1h ago
I have wondered if with these tests it'll reach a point where online models cheat by generating a line art raster reference then behind the scenes deciding how to vectorize it in the most minimalist way (eg: using strokes and shape elements, etc, rather than naively using path outlines for all forms).
taberiand•1h ago
Is that cheating, or is that just working smarter not harder?
tylervigen•1h ago
That’s among the most artistic SVGs I’ve ever seen, period.
alestainer•1h ago
Interesting thing: I've got my internal request that is similar to this pelican. And there was 0 progress on it in the past ~2 years. Which might have at least a couple of explanations. 1. Spillage into the pre-training: some real artist had drawn a pelican riding a bicycle. 2. Seeing it as an important discourse for model intelligence in the training data might affect allocation of compute into solving this problem, either thru engineers or the model itself finding the texts about this challenge.
WarmWash•1h ago
Are AI labs training on the bike Pelican?

From the blog:

>The strongest argument is that they would get caught. If a model finally comes out that produces an excellent SVG of a pelican riding a bicycle you can bet I’m going to test it on all manner of creatures riding all sorts of transportation devices. If those are notably worse it’s going to be pretty obvious what happened.

He mentioned in the Deep Think thread the other day that his secret test set also was impressive.

News publishers limit Internet Archive access due to AI scraping concerns

https://www.niemanlab.org/2026/01/news-publishers-limit-internet-archive-access-due-to-ai-scrapin...
259•ninjagoo•3h ago•153 comments

uBlock filter list to hide all YouTube Shorts

https://github.com/i5heu/ublock-hide-yt-shorts/
292•i5heu•4h ago•92 comments

My smart sleep mask broadcasts users' brainwaves to an open MQTT broker

https://aimilios.bearblog.dev/reverse-engineering-sleep-mask/
267•minimalthinker•6h ago•125 comments

Ooh.directory: a place to find good blogs that interest you

https://ooh.directory/
358•hisamafahri•8h ago•107 comments

How often do full-body MRIs find cancer?

https://www.usatoday.com/story/life/health-wellness/2026/02/11/full-body-mris-cancer-aneurysm/883...
35•brandonb•23h ago•21 comments

Breaking the spell of vibe coding

https://www.fast.ai/posts/2026-01-28-dark-flow/
42•arjunbanker•1d ago•9 comments

OpenAI should build Slack

https://www.latent.space/p/ainews-why-openai-should-build-slack
58•swyx•14h ago•60 comments

15× vs. ~1.37×: Recalculating GPT-5.3-Codex-Spark on SWE-Bench Pro

https://twitter.com/nvanlandschoot/status/2022385829596078100
22•nvanlandschoot•1d ago•8 comments

Amsterdam Compiler Kit

https://github.com/davidgiven/ack
72•andsoitis•5h ago•14 comments

Zvec: A lightweight, fast, in-process vector database

https://github.com/alibaba/zvec
7•dvrp•1d ago•1 comments

Launching Interop 2026

https://hacks.mozilla.org/2026/02/launching-interop-2026/
21•linolevan•23h ago•2 comments

IBM tripling entry-level jobs after finding the limits of AI adoption

https://fortune.com/2026/02/13/tech-giant-ibm-tripling-gen-z-entry-level-hiring-according-to-chro...
82•WhatsTheBigIdea•22h ago•31 comments

Discord: A case study in performance optimization

https://newsletter.fullstack.zip/p/discord-a-case-study-in-performance
17•tylerdane•21h ago•6 comments

A header-only C vector database library

https://github.com/abdimoallim/vdb
39•abdimoalim•4h ago•10 comments

Descent, ported to the web

https://mrdoob.github.io/three-descent/
92•memalign•2h ago•13 comments

Show HN: Sameshi – a ~1200 Elo chess engine that fits within 2KB

https://github.com/datavorous/sameshi
164•datavorous_•8h ago•49 comments

5,300-year-old 'bow drill' rewrites story of ancient Egyptian tools

https://www.ncl.ac.uk/press/articles/latest/2026/02/ancientegyptiandrillbit/
14•geox•3d ago•0 comments

Unicorn Jelly

https://unicornjelly.com/
14•avaer•9h ago•4 comments

You can't trust the internet anymore

https://nicole.express/2026/not-my-casual-hobby.html
123•panic•2h ago•87 comments

Ask HN: How to get started with robotics as a hobbyist?

115•StefanBatory•6d ago•54 comments

Windows NT/OS2 Design Workbook

https://computernewb.com/~lily/files/Documents/NTDesignWorkbook/
47•markus_zhang•3d ago•14 comments

A review of M Disc archival capability with long term testing results (2016)

http://www.microscopy-uk.org.uk/mag/artsep16/mol-mdisc-review.html
44•1970-01-01•6h ago•47 comments

Instagram's URL Blackhole

https://medium.com/@shredlife/instagrams-url-blackhole-c1733e081664
5•tkp-415•1d ago•0 comments

Fun with Algebraic Effects – From Toy Examples to Hardcaml Simulations

https://blog.janestreet.com/fun-with-algebraic-effects-hardcaml/
46•weinzierl•4d ago•1 comments

The consequences of task switching in supervisory programming

https://martinfowler.com/fragments/2026-02-13.html
9•bigwheels•1d ago•0 comments

Vim 9.2

https://www.vim.org/vim-9.2-released.php
283•tapanjk•6h ago•126 comments

Zig – io_uring and Grand Central Dispatch std.Io implementations landed

https://ziglang.org/devlog/2026/#2026-02-13
332•Retro_Dev•13h ago•235 comments

An AI agent published a hit piece on me – more things have happened

https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me-part-2/
581•scottshambaugh•21h ago•509 comments

A method and calculator for building foamcore drawer organisers

https://capnfabs.net/posts/foamcore-would-be-a-sick-name-for-a-music-genre/
53•evakhoury•5d ago•12 comments

Show HN: A reputation index from mitchellh's Vouch trust files

https://vouchbook.dev/
7•rosslazer•1d ago•0 comments