frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Gemini 3.0 Pro – early tests

https://twitter.com/chetaslua/status/1973694615518880236
76•ukuina•1h ago

Comments

simonw•1h ago
I've seen a bunch of tweets like this recently, as far as I can tell they're all from people using https://aistudio.google.com/ who got served an A/B test.

A few more in this genre:

https://x.com/cannn064/status/1973818263168852146 - "Make a SVG of a PlayStation 4 controller"

https://x.com/cannn064/status/1973415142302830878 "Create a single, self-contained HTML5 file that mimics a macOS Sonoma-style desktop: translucent menu bar with live clock, magnifying dock, draggable/resizable windows, and a dynamic wallpaper. No external assets; use inline SVG for icons."

https://x.com/synthwavedd/status/1973405539708056022 "Write full HTML, CSS and Javascript for a very realistic page on Apple's website for the new iPhone 18"

I've not seen it myself so I'm not sure how confident they are that it's Gemini 3.0.

ceejayoz•55m ago
> a very realistic page on Apple's website…

Is this supposed to be a good example?

It looks like something I'd put together, and you don't want me doing design work.

ajcp•37m ago
At this point until I see one run through the Pelican Benchmark I can't really take a new model seriously.
diggan•32m ago
Unfortunately, as every public benchmark, once it ends up in the training sets and the developers aware of it, it stops being effective, and I think we've started to reach that point.

The only thing I've found to give me some sort of quantitative idea of how good a new model is, is my own private benchmarks. It doesn't cover everything I want to use LLMs for, and only has 20-30 tests per "category", but at least I'm 99% sure it isn't in the training datasets.

simonw•29m ago
I have a few "SVG of an X riding a Y" tests that I don't publish online which I run occasionally to see if a model is suspiciously better at drawing a pelican riding a bicycle than some other creature on some other form of transport.

I would be so entertained if I found out an AI lab had wasted their time cheating on my dumb benchmark!

ajcp•23m ago
-> I would be so entertained if I found out an AI lab had wasted their time cheating on my dumb benchmark!

Que intro: "The gang wastes their time cheating on a dumb benchmark"

Imustaskforhelp•8m ago
Please do let us know through your blog post if you ever find AI labs to cheat on your benchmark.

But now I am worried that since you have shared that you do SVG of an X riding a Y thing, maybe these models will try to cheat on the whole SVG of X riding Y thing instead of hyper focusing the pelican.

So now I suppose you might need to come up with an entirely new thing though :)

ajcp•21m ago
That's the move right there.
latemedium•4m ago
We need to know if big AI labs are explicitly training models to generate SVGs of pelicans on bicycles. I wouldn't put it past them. But it would be pretty wild in they did!
esafak•1h ago
Nothing to see here.
Oras•1h ago
These tests mean nothing; I yet to see a model that is better than Sonnet 4 for coding. I tried many, all of them are sub-par, even with a small code base.
nnevatie•45m ago
Well, Codex with GPT5 High wins Claude Sonnet 4.5 - this is anecdotal, but I've used both extensively.
Bolwin•11m ago
Well yeah no surprise. You should try glm 4.6
strongpigeon•50m ago
Google's biggest problem in my opinion (and I'm saying that as an ex-googler) is that Google doesn't have a product culture. Google had the tech for something like ChatGPT for a long time, but couldn't come up with that product. Instead it had to rely on another company showing it the way and then copy them and try to out-engineer them...

I still think ultimately (and somewhat sadly) Google will win the AI race due to its engineering talent and the sheer amount of data it has (and Android integration potential).

sho_hn•43m ago
To be fair, according to OpenAI they started ChatGPT as a demo/experiment and were taken by surprise when it went viral.

It may well be that they also didn't have a product culture as an organization, but were willing to experiment or let small teams do so.

It's still a lesson, but maybe a different one.

With organizational scale it becomes harder and harder to launch experiments under the brand. Red tape increases, outside scrutiny increases. Retaining the ability to do that is difficult.

Google does experiment a fair bit (including in AI, e.g. NotebookLLM and its podcast feature are I think a standout example of trying to see what sticks) but they also tend to try to hide their experiments in developer portals nowadays, which makes it difficult to get a signal from a general consumer audience.

strongpigeon•32m ago
Google is definitely good at experimenting (and yeah NotebookLLM is really cool), which is a product of the bottom-up culture. The lack of a consistent story with regard to AI products however is a testament to the lack of product vision from the top.
ajcp•26m ago
NotebookLM came out of Google Labs though, and in collaboration with outside stakeholders. I'm not sure I would call it a success of "bottom-up" culture, but a well realized idea from a dedicated incubator. That doesn't necessarily mean the rest of the company is so empowered or product oriented.
ajcp•31m ago
-> With organizational scale it becomes harder and harder to launch experiments under the brand

I feel like Google tried to solve for this with their `withgoogle.com` domain and it just ends up being confusing or worse still, frustrating when you see something awesome and then nothing ever comes of it.

byefruit•43m ago
And even when it does copy other products, it seems to be doing a terrible job of them.

Google's AI offering is a complete nightmare to use. Three different APIs, at least two different subscriptions, documentation that uses them interchangeably.

For Gemini's API it's often much simpler to actually pay OpenRouter the 5% surchargeto BYOK than deal with it all.

I still can't use my Google AI Pro account with gemini-cli..

xnx•41m ago
> Google doesn't have a product culture

Fair criticism that it took someone else to make something of the tech that Google initially invented, but Google is furiously experimenting with all their active products since Sundar's "code red" memo.

adventured•38m ago
Along with its engineering talent and resource scale, I think their in-house chips are one of their core advantages. They can scale in a way that their peers are going to struggle to match, and at much lower cost. Nvidia's extreme margins are Google's opportunity.
renewiltord•33m ago
Well, they had an internal ethics team that told them that their technology was garbage. That can't help. The other guys' ethics teams are all like "Our stuff is too awesome for people to use. No one should have this kind of unbridled power. We must muzzle the beast before a tourist rides him" and Google's ethics team was like "our shit sucks lol this is just a Markov chain parrot doesn't do shit it's garbage".
thewebguyd•23m ago
> is that Google doesn't have a product culture.

This is evident in Android and the pixel lineup, which could be my favorite phone if not for some of the most baffling and frustrating decisions that lead to a very weirdly disjointed app experience (comparing to something like iOS's first party tools).

Like removing location based reminders from google tasks, for some reason? Still no apple shortcuts-like automation built-in, keep can still do location based reminders but it's a notes app so which am I supposed to use? Google tasks or keep? Well, gemini adds reminders to google tasks and not keep if I wanted to use keep primarily.

If they just spent some time polishing and integrating these tools, and add some of their ML magic to it they'd blow Apple out of the park.

All of Google's tech is cool and interesting, from a tech standpoint but it's not well integrated for a full consumer experience.

killerstorm•17m ago
ChatGPT-3.5 was more of a novelty than a product.

It would be weird to release that as a serious company. They tried making a deliberately-wacky chatbot but it was not fun.

Letting OpenAI to release it first was a right move.

Imustaskforhelp•2m ago
To me, I want openai to release the Chatgpt 3 and chatgpt 3.5 as the phenomenal leap of intelligence and even I appreciated the Chatgpt 3 a lot, more so than even now like It had its quirks but it was such a good model man.

I remember forming a really simple dead simple sveltekit website during Chatgpt 3. It was good, it was mind blowing and I was proud of it.

The only interactivity was a button which would go from one color to other and it would then lead to a pdf.

If I am going to be honest, the UI was genuinely good. It was great tho and still gives me more nostalgia and good vibes than current models. Em-dashes weren't that common in Chatgpt 3 iirc but I have genuinely forgotten what it was like to talk to it

wmf•12m ago
Didn't Google have Bard internally around the same time as ChatGPT?
maerch•41m ago
I still have a bad taste in my mouth after all those GPT-5 hype articles that claimed the model was just one step away from AGI.
vunderba•37m ago
Outside of the aesthetic, the very first example on that twitter post is "balls bouncing around a constrained rotating rigid physics environment" which has been trivially one-shottable since Claude Code was first announced.

It was one of the first things I tried when Claude Code went GA:

https://gondolaprime.pw/hex-balls

ACCount37•36m ago
I hope this is the one that unfucks the multi-turn instruction following.

One of the biggest issues holding Gemini back, IMO, compared to the competitors.

Many LLMs are still plagued by "it's easier to reset the conversation than to unfuck the conversation", but Gemini 2.5 is among the worst.

renewiltord•32m ago
Every three months there's some mind blowing hype around a Google product, lots of people talk about it, and then when I use it it's not nearly as good.
robots0only•8m ago
In all of these posts there is someone claiming Claude is the best, then somebody else claiming they have tried a bunch of times and for them Gemini is the best while others find GPT-5 is supreme. Obviously, all of these are subjective narrow experiences. My conclusion is that all frontier models are both good and bad with no clear winner and making good evals is really hard.

The Energy Drink Political Compass

https://julius.ai/s/notebooks/5e7fc1bf-6ef3-456e-89a4-645af433e648
1•zachperkel•43s ago•0 comments

We are thrilled to announce that our NEW Large Language Model

https://twitter.com/MerriamWebster/status/1971565721743200406
1•askl•2m ago•0 comments

AMD in early talks to make chips at Intel Foundry, report says

https://www.tomshardware.com/pc-components/cpus/amd-in-early-talks-to-make-chips-at-intel-foundry...
1•rbanffy•4m ago•0 comments

Google insists sideloading on Android will survive its new rules

https://www.androidauthority.com/google-sideloading-android-developer-verification-rules-3602811/
1•maxloh•5m ago•0 comments

AI Drives Battery Innovation at Microsoft, IBM

https://spectrum.ieee.org/ai-battery-material
1•rbanffy•5m ago•0 comments

Understanding Weak References in Python

https://blog.codingconfessions.com/p/a-strong-reference-to-weak-references
1•rbanffy•5m ago•0 comments

Yelping with Cormac

https://yelpingwithcormac.tumblr.com/best
1•pugworthy•12m ago•1 comments

How to be an Atheist in Medieval Europe (2018)

https://www.gresham.ac.uk/watch-now/atheist-medieval-europe
2•stared•13m ago•0 comments

Gaia discovers our galaxy's great wave

https://www.esa.int/Science_Exploration/Space_Science/Gaia/Gaia_discovers_our_galaxy_s_great_wave
3•smartmic•15m ago•0 comments

NASA's Tally of Planets Outside Our Solar System Reaches 6k

https://www.nasa.gov/universe/exoplanets/nasas-tally-of-planets-outside-our-solar-system-reaches-...
3•CharlesW•15m ago•0 comments

How Claude Sonnet 4.5 works for 30 hours straight

https://threadreaderapp.com/thread/1972793278744461627.html
2•gmays•15m ago•0 comments

Anti-Aging Breakthrough: Stem Cells Reverse Signs of Aging in Monkeys

https://www.nad.com/news/anti-aging-breakthrough-stem-cells-reverse-signs-of-aging-in-monkeys
2•bilsbie•17m ago•0 comments

Ask HN: How has your experience been with iOS 26?

1•superconduct123•17m ago•0 comments

"OpenAI Is Trying to Get Sued" – Nintendo IP Floods Sora 2 Video Generation App

https://www.nintendolife.com/news/2025/10/openai-is-trying-to-get-sued-nintendo-ip-floods-sora-2-...
4•mikhael•19m ago•0 comments

New Zealand's Institute of IT Professionals Collapses

https://www.theregister.com/2025/10/02/nz_itp_collapse/
1•worik•19m ago•0 comments

Ask HN: Does AI understand your ideas better than humans?

1•amichail•20m ago•0 comments

AI Stan Lee Debuted at L.A. Comic-Con

https://www.instagram.com/reel/DPN4xLMD8bQ/
1•CharlesW•21m ago•0 comments

Why most product planning is bad and what to do about it

https://blog.railway.com/p/product-planning-improvement
4•ndneighbor•22m ago•0 comments

The First Decade as Faculty

https://data-people-group.github.io/blogs/2025/09/30/ten-papers/
2•azhenley•22m ago•0 comments

Our project is participating in Hacktoberfest 2025

https://github.com/hmpl-language/hmpl/issues
1•aanthonymax•22m ago•1 comments

The Mythical Man-Month

https://en.wikipedia.org/wiki/The_Mythical_Man-Month
1•fidotron•24m ago•0 comments

What WASM 3.0 launch means for .NET Developers

https://platform.uno/blog/wasm-3-0-for-net-developers/
2•sasakrsmanovic2•25m ago•2 comments

Trump Explores Bailout of at Least $10B for U.S. Farmers

https://www.wsj.com/politics/policy/trump-explores-bailout-of-at-least-10-billion-for-u-s-farmers...
4•JumpCrisscross•26m ago•1 comments

Accounting for uncertainty to help engineers design complex systems

https://news.mit.edu/2025/accounting-uncertainty-help-engineers-design-complex-systems-1002
1•gnabgib•27m ago•0 comments

The architecture behind 99.9999% uptime in Erlang

https://volodymyrpotiichuk.com/blog/articles/the-architecture-behind-99%25-uptime
2•birdculture•28m ago•0 comments

The other space race: why the world is obsessed with sending objects into orbit

https://theconversation.com/the-other-space-race-why-the-world-is-obsessed-with-sending-objects-i...
1•zeristor•29m ago•0 comments

Your Agent Test Suite Is an Essential Onboarding Document

https://agent-ci.com/blog/2025/10/02/onboarding-developers-with-testing
1•tcdent•30m ago•0 comments

Universities must Comply with federal rules for funding

https://arstechnica.com/science/2025/10/trump-offers-universities-a-choice-comply-for-preferentia...
1•worik•30m ago•1 comments

Ask HN: Preemptive practical security steps for a postquant world?

1•Havoc•30m ago•0 comments

Half the forests have fragmented over the last 20 years

https://phys.org/news/2025-09-metrics-habitat-fragmentation-world-forests.html
2•PaulHoule•33m ago•0 comments