Baidu releases open-source multimodal AI that it claims beats GPT-5 and Gemini

https://venturebeat.com/ai/baidu-just-dropped-an-open-source-multimodal-ai-that-it-claims-beats-gpt-5

9•teleforce•2mo ago

Comments

bn-l•2mo ago

> The model, dubbed ERNIE-4.5-VL-28B-A3B-Thinking

No way at so few parameters

verdverm•2mo ago

Recent research results from many groups suggest otherwise. The lag between private models to competitive open models has been shrinking, same for the resources required to train and run them

The people who are spending billions on ai infra build outs want you to believe it's necessary, because frontier mega models are supposedly so much better. China has been showing us otherwise, especially being handicapped by export controls and showing how you can do more with less

NitpickLawyer•2mo ago

> The lag between private models to competitive open models has been shrinking

It really hasn't. It's the opposite, actually. The latest breakthroughs in RL by the big4 labs haven't been replicated yet in any open model (including the latest k2-thinking). Even gemini-2.5 still delivers on generalisation in a way that no open models do, today (almost a year later). The general consensus was that "open" models were 6-8 months behind SotA, but with the RL stuff we can see they've moved further away.

I don't know what exactly it is, if it's simply RL scale, or data + scale, or better secret sauce (rewards, masking, something else) but the way these new models generalise is leagues ahead of open models, sadly.

Don't be fooled by benchmarks alone. You have to test them on problems that you own and you can be fairly sure no one is targeting for benchmark scores. Recently there was a python golfing competition on kaggle, and I tested some models on that task. While the top4 models were chugging along, in both agentic and 0shot regimes, the open models (coding specific or, older "thinking" models) were really bad at the task. 480b models, coding specific, would go in circles, get lost on one example, and so on. Night and day between the open models and gpt5/claude/gemini2.5. Even grok fast solved a lot of tasks in agentic mode.

verdverm•2mo ago

While I agree with your comments here, I will note that the big 4 models were released this year (summer-ish) so we are still not at a point you can claim the open models are more than a year behind something that is not a year old yet

verdverm•2mo ago

HF link: https://huggingface.co/baidu/ERNIE-4.5-VL-28B-A3B-Thinking

JSR_FDED•2mo ago

I know it’s popular to hate on China right now, but can we acknowledge that Chinese companies and research groups have done more for us hackers in terms of making amazing models available with open weights for free, than US companies and research groups?

Show HN: Animated beach scene, made with CSS

An update on unredacting select Epstein files – DBC12.pdf liberated

Was going to share my work

Pitchfork: A devilishly good process manager for developers

You Are Here

Why social apps need to become proactive, not reactive

How patient are AI scrapers, anyway? – Random Thoughts

Vouch: A contributor trust management system

I built a terminal monitoring app and custom firmware for a clock with Claude

Tiny C Compiler

Y Combinator Founder Organizes 'March for Billionaires'

Ask HN: Need feedback on the idea I'm working on

OpenClaw Addresses Security Risks

Apple finalizes Gemini / Siri deal

Italy Railways Sabotaged

Emacs-tramp-RPC: high-performance TRAMP back end using MsgPack-RPC

Nintendo Wii Themed Portfolio

"There must be something like the opposite of suicide "

Ask HN: Why doesn't Netflix add a “Theater Mode” that recreates the worst parts?

Show HN: Engineering Perception with Combinatorial Memetics

Show HN: Steam Daily – A Wordle-like daily puzzle game for Steam fans

The Anthropic Hive Mind

Just Started Using AmpCode

LLM as an Engineer vs. a Founder?

Crosstalk inside cells helps pathogens evade drugs, study finds

Show HN: Design system generator (mood to CSS in <1 second)

Show HN: 26/02/26 – 5 songs in a day

Toroidal Logit Bias – Reduce LLM hallucinations 40% with no fine-tuning

Top AI models fail at >96% of tasks

The Science of the Perfect Second (2023)

Show HN: Animated beach scene, made with CSS

An update on unredacting select Epstein files – DBC12.pdf liberated

Was going to share my work

Pitchfork: A devilishly good process manager for developers

You Are Here

Why social apps need to become proactive, not reactive

How patient are AI scrapers, anyway? – Random Thoughts

Vouch: A contributor trust management system

I built a terminal monitoring app and custom firmware for a clock with Claude

Tiny C Compiler

Y Combinator Founder Organizes 'March for Billionaires'

Ask HN: Need feedback on the idea I'm working on

OpenClaw Addresses Security Risks

Apple finalizes Gemini / Siri deal

Italy Railways Sabotaged

Emacs-tramp-RPC: high-performance TRAMP back end using MsgPack-RPC

Nintendo Wii Themed Portfolio

"There must be something like the opposite of suicide "

Ask HN: Why doesn't Netflix add a “Theater Mode” that recreates the worst parts?

Show HN: Engineering Perception with Combinatorial Memetics

Show HN: Steam Daily – A Wordle-like daily puzzle game for Steam fans

The Anthropic Hive Mind

Just Started Using AmpCode

LLM as an Engineer vs. a Founder?

Crosstalk inside cells helps pathogens evade drugs, study finds

Show HN: Design system generator (mood to CSS in <1 second)

Show HN: 26/02/26 – 5 songs in a day

Toroidal Logit Bias – Reduce LLM hallucinations 40% with no fine-tuning

Top AI models fail at >96% of tasks

The Science of the Perfect Second (2023)

Baidu releases open-source multimodal AI that it claims beats GPT-5 and Gemini

Comments