Rio de Janeiro's city government model Rio3.5 beats Qwen3.7 in recent benchmarks

https://twitter.com/zenmagnets/status/2065796012820848699

83•lucasfcosta•1h ago

Comments

mettamage•1h ago

https://xcancel.com/ZenMagnets/status/2065796012820848699

Correct me if I'm wrong but reading through the comments of the thread this seems to be post training/fine tuning.

Kelteseth•1h ago

Thanks, Firefox and uBlock does not let me watch any X content (I guess this is a good thing)

drnick1•54m ago

Same thing here, X content and trackers are blocked by my Firefox settings. The occasional inconvenience is a small price to pay not to be profiled by X, Google, FB, Amazon, and countless other Internet parasites.

oceansky•1h ago

Yes. It's post training in qwen using the novel SwiReasoning framework.

hedgehog•45m ago

I hadn't seen SwiReasoning (https://swireasoning.github.io, paper and code), it looks like that works at generation time without any requirements on the model. It increases token-efficiency and accuracy, but at first skim it seems like this would be incompatible with multi-token prediction. For large reductions in token budget it could be worth it.

rafaquintanilha•4m ago

Doesn't look like it's incompatible. Someone already released a quantization using MTP: https://huggingface.co/foxipanda/Rio-3.5-Open-397B-GGUF

cuzezzzbbfofai•1h ago

We should trust a government to produce great quality AI models.

atoav•1h ago

A government ideally is a representation of the democratically chosen will of the people. If it is not, work towards making it so. IMO wherever someone says "the government" we should mentally substitute "we all, collectively".

But a specific type of person appears to labour under the illusion that somehow we can get by without we all collectively steering our direction and choosing people who do what needs to be done without commercial interest. Their idea is that instead of choosing people who do it, we just make them compete for who can squeeze the most profit out of dealing with a problem and "somehow" that leads to a better result. When you press them for the details on that part of the mechanism, you will usually get crickets.

cassianoleal•1h ago

Thank you, that's also one of my peeves.

Interestingly, the people who try to separate themselves from "the government" also seem to be the kind of people who want to "spread our model of democracy to the rest of the world".

How they can even reconcile being such a great democracy that the world needs to ~copy~ be force-fed with having an adversary government I don't know. The cognitive dissonance is so great that it's hard to fathom.

hgoel•57m ago

It's all such a self-defeating ideology, they think the government isn't doing a good enough job, so they lobby to make it impossible for them to do a good job and then pretend that it proves their point.

naasking•

HeliumHydride•1h ago

https://www.reddit.com/r/LocalLLaMA/comments/1u4fzg1/new_mod... https://x.com/SemiAnalysis_/status/2065894494935933191

ramon156•1h ago

Every day I'm reminded why I don't spend time on twitter. What use does it have to claim "X is better than Y in benchmark Z, disagreeing with that means disagreeing with me"

Information is power, dick measurements are not.

reed1234•1h ago

No, I love twitter— and you are wrong.

itsthecourier•55m ago

my length is a valid data point for the sake of science

adrian_b•1h ago

> Post-trained from Qwen 3.5 397B

Model Card:

https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B

hmokiguess•1h ago

Never let them know your next move

Aurornis•47m ago

A city government funding a fine-tune of a model is interesting.

As for the benchmarks: If you spend any time playing with fine tunes of published models you know that benchmarks are gamed so much that they're a useless indicator of performance for models from small teams. It's too easy to fine tune a model to perform well on the benchmarks, release it, put a line on your resume saying you released a model that beat the major labs on benchmarks, and then try to use that to jump into a new job. The temptation is high.

There are a lot of fringe models and fine tunes that claim to have better performance on some benchmark. Then you try to use them and find they're often worse at general tasks than the base model.

I would wait and see if these results hold across other benchmarks. It's cool that the city is doing something with AI, but this is something where extraordinary claims require extraordinary evidence. I doubt a small, previously unknown team has unlocked something secret that the team who made Qwen couldn't figure out. It's more likely it was fine tuned for a specific outcome (possibly these benchmarks) and performance in other areas was reduced as a consequence.

embedding-shape•8m ago

Indeed, this is all very true, I'd say it's true for the larger teams too, the entire ecosystem is so gamed by now that if you don't have your own private benchmarks with private test cases you haven't shared publicly, it's almost impossible to get a fair picture how well a model works, unless you actually sit down and use it.

marcosdumay•5m ago

> A city government funding a fine-tune of a model is interesting.

Looks like it's an IT services government-owned company.

Most likely, they saw some business opportunity on selling it around for cities.

arjie•41m ago

Benchmaxxing is the new “have a crypto trading strategy”. No one is impressed by it except non practitioners.

No, everyone is not using AI for everything

The Birth and Death of JavaScript (2014)

Firewood Splitting Simulator

Measles surge in Utah sparks fears US could undo decades of progress

Rio de Janeiro's "homegrown" LLM appears to be a merge of an existing model

Lisp's Influence on Ruby

FarOutCompany

Caddy compatibility for zeroserve: 3x throughput and 70% lower latency

The only scalable delete in Postgres is DROP TABLE

Rio de Janeiro's city government model Rio3.5 beats Qwen3.7 in recent benchmarks

Global density and biomass of arbuscular mycorrhizal fungal networks

Perlisisms

I indexed 669 GB of my GoPro videos using my M1 Max computer and local ML models

Formal Methods and the Future of Programming

Show HN: Dual YOLOv8n UAV Detection on RK3588S at 42 FPS Using NPU

How did Atari apply side art to Arcade Cabinets?

How to Earn a Billion Dollars

Show HN: 3D print Z reinforcement via injected loops

Free SQL→ER diagram tool, runs in the browser, nothing uploaded

Honda Civics and the Evil Valet

EU Commission looking at practical consequences of Anthropic decision

Dangerous hormone-disrupting chemicals found in US breast milk samples

KPMG pulls report on AI usage due to apparent hallucinations

Cloud-based LLM gold rush is ending

Extinction-Level Capitalism

Don't trust large context windows

Historic co-determination helps monasteries navigate digital change

Conversations with a six-year-old on functional programming (2018)

A 'cold blob' in the Atlantic could be a sign of AMOC shutdown – CNN

FreeOberon – Open-Source, Cross-Platform, Free Pascal/Turbo Pascal-Like Language

Rio de Janeiro's city government model Rio3.5 beats Qwen3.7 in recent benchmarks

Comments

No, everyone is not using AI for everything

The Birth and Death of JavaScript (2014)

Firewood Splitting Simulator

Measles surge in Utah sparks fears US could undo decades of progress

Rio de Janeiro's "homegrown" LLM appears to be a merge of an existing model

Lisp's Influence on Ruby

FarOutCompany

Caddy compatibility for zeroserve: 3x throughput and 70% lower latency

The only scalable delete in Postgres is DROP TABLE

Rio de Janeiro's city government model Rio3.5 beats Qwen3.7 in recent benchmarks

Global density and biomass of arbuscular mycorrhizal fungal networks

Perlisisms

I indexed 669 GB of my GoPro videos using my M1 Max computer and local ML models

Formal Methods and the Future of Programming

Show HN: Dual YOLOv8n UAV Detection on RK3588S at 42 FPS Using NPU

How did Atari apply side art to Arcade Cabinets?

How to Earn a Billion Dollars

Show HN: 3D print Z reinforcement via injected loops

Free SQL→ER diagram tool, runs in the browser, nothing uploaded

Honda Civics and the Evil Valet

EU Commission looking at practical consequences of Anthropic decision

Dangerous hormone-disrupting chemicals found in US breast milk samples

KPMG pulls report on AI usage due to apparent hallucinations

Cloud-based LLM gold rush is ending

Extinction-Level Capitalism

Don't trust large context windows

Historic co-determination helps monasteries navigate digital change

Conversations with a six-year-old on functional programming (2018)

A 'cold blob' in the Atlantic could be a sign of AMOC shutdown – CNN

FreeOberon – Open-Source, Cross-Platform, Free Pascal/Turbo Pascal-Like Language