frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Rio de Janeiro's city government model Rio3.5 beats Qwen3.7 in recent benchmarks

https://twitter.com/zenmagnets/status/2065796012820848699
78•lucasfcosta•1h ago

Comments

mettamage•1h ago
https://xcancel.com/ZenMagnets/status/2065796012820848699

Correct me if I'm wrong but reading through the comments of the thread this seems to be post training/fine tuning.

Kelteseth•1h ago
Thanks, Firefox and uBlock does not let me watch any X content (I guess this is a good thing)
drnick1•46m ago
Same thing here, X content and trackers are blocked by my Firefox settings. The occasional inconvenience is a small price to pay not to be profiled by X, Google, FB, Amazon, and countless other Internet parasites.
oceansky•1h ago
Yes. It's post training in qwen using the novel SwiReasoning framework.
hedgehog•37m ago
I hadn't seen SwiReasoning (https://swireasoning.github.io, paper and code), it looks like that works at generation time without any requirements on the model. It increases token-efficiency and accuracy, but at first skim it seems like this would be incompatible with multi-token prediction. For large reductions in token budget it could be worth it.
cuzezzzbbfofai•1h ago
We should trust a government to produce great quality AI models.
atoav•1h ago
A government ideally is a representation of the democratically chosen will of the people. If it is not, work towards making it so. IMO wherever someone says "the government" we should mentally substitute "we all, collectively".

But a specific type of person appears to labour under the illusion that somehow we can get by without we all collectively steering our direction and choosing people who do what needs to be done without commercial interest. Their idea is that instead of choosing people who do it, we just make them compete for who can squeeze the most profit out of dealing with a problem and "somehow" that leads to a better result. When you press them for the details on that part of the mechanism, you will usually get crickets.

cassianoleal•55m ago
Thank you, that's also one of my peeves.

Interestingly, the people who try to separate themselves from "the government" also seem to be the kind of people who want to "spread our model of democracy to the rest of the world".

How they can even reconcile being such a great democracy that the world needs to ~copy~ be force-fed with having an adversary government I don't know. The cognitive dissonance is so great that it's hard to fathom.

hgoel•49m ago
It's all such a self-defeating ideology, they think the government isn't doing a good enough job, so they lobby to make it impossible for them to do a good job and then pretend that it proves their point.
naasking•
HeliumHydride•1h ago
https://www.reddit.com/r/LocalLLaMA/comments/1u4fzg1/new_mod... https://x.com/SemiAnalysis_/status/2065894494935933191
ramon156•1h ago
Every day I'm reminded why I don't spend time on twitter. What use does it have to claim "X is better than Y in benchmark Z, disagreeing with that means disagreeing with me"

Information is power, dick measurements are not.

reed1234•1h ago
No, I love twitter— and you are wrong.
itsthecourier•47m ago
my length is a valid data point for the sake of science
adrian_b•1h ago
> Post-trained from Qwen 3.5 397B

Model Card:

https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B

hmokiguess•1h ago
Never let them know your next move
Aurornis•39m ago
A city government funding a fine-tune of a model is interesting.

As for the benchmarks: If you spend any time playing with fine tunes of published models you know that benchmarks are gamed so much that they're a useless indicator of performance for models from small teams. It's too easy to fine tune a model to perform well on the benchmarks, release it, put a line on your resume saying you released a model that beat the major labs on benchmarks, and then try to use that to jump into a new job. The temptation is high.

There are a lot of fringe models and fine tunes that claim to have better performance on some benchmark. Then you try to use them and find they're often worse at general tasks than the base model.

I would wait and see if these results hold across other benchmarks. It's cool that the city is doing something with AI, but this is something where extraordinary claims require extraordinary evidence. I doubt a small, previously unknown team has unlocked something secret that the team who made Qwen couldn't figure out. It's more likely it was fine tuned for a specific outcome (possibly these benchmarks) and performance in other areas was reduced as a consequence.

arjie•34m ago
Benchmaxxing is the new “have a crypto trading strategy”. No one is impressed by it except non practitioners.
33m ago
> IMO wherever someone says "the government" we should mentally substitute "we all, collectively".

No, we should substitute "unaccountable bureaucrats". The people who enter and leave power from elections are not the source of the daily frustrations people have with government, it's the rest.

atoav•22m ago
If this is in fact an issue where you life, then you should consider stopping to elect politicians that allow bureaucrats to be unaccountable. Or stop believing politicians who rave on about how bureaucrats are unaccountable while they themselves have the power to shape systems where that would not be the case.
airstrike•19m ago
how do you think that alleged amorphous mass of unaccountable bureaucrats got their jobs?
blahblaher•17m ago
yes, let's instead trust a bunch of billionaires, that "for sure" have your and all of our interests at heart. And no, the "invisible hand" does not exist, it's the Epstein class hand, you just don't see it

AI is code and can't be prompted into being smarter

https://www.theregister.com/ai-and-ml/2026/06/14/ai-is-code-and-cant-be-prompted-into-being-smarter/
1•adam_rida•32s ago•0 comments

Show HN: I made a small helper for checking model-graded answers

https://github.com/MatteoLeonesi/claim-memory-graph-sdk
1•ML0037•1m ago•0 comments

DeepSeek's 10T USD grand strategy

https://twitter.com/bookwormengr/status/2057909493250539891
1•gmays•1m ago•0 comments

Efficacy of dopamine agonist pramipexole for anhedonic depression

https://www.nature.com/articles/s41591-026-04465-9
1•bookofjoe•1m ago•0 comments

GitHub Pages alternative with native Python

https://blog.klemek.fr/articles/2026-06-14/
1•klemek•3m ago•0 comments

Career Update – Life After Stepping Down

https://kevquirk.com/career-update
1•speckx•4m ago•0 comments

Attack Is Taking Aim at Palantir – Novara Media

https://novaramedia.com/2026/06/01/massive-attack-is-taking-aim-at-palantir/
3•abdelhousni•5m ago•0 comments

Mlx-optiq: per-layer mixed-precision LLM quantization for Apple Silicon

https://mlx-optiq.com/
2•codelion•7m ago•0 comments

Journal–A Tale of Two Browsers

https://adactio.com/journal/22609
2•speckx•10m ago•0 comments

Parsing JSON at compile time with C++26 static reflection

https://twitter.com/lemire/status/2066174269839519796
2•tosh•10m ago•0 comments

Why Agents Don't Scale: It's an Engineering Problem, Not an AI Problem

https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
3•dovelome•10m ago•0 comments

We Built AIventure, an AI-Powered Retro Dungeon

https://bebechien.github.io/cozy-corner-future/posts/how-we-built-aiventure/
3•simonpure•10m ago•0 comments

VRChat says somebody faked a breach notice with the Maine AG's office

https://www.theregister.com/security/2026/06/11/24m-vrchat-users-data-accessed-following-cloud-br...
3•Bender•11m ago•0 comments

Fallback Font Generator

https://screenspan.net/fallback
3•microflash•13m ago•0 comments

Why Japan's Rail Workers Can't Stop Pointing at Things (2017)

https://www.atlasobscura.com/articles/pointing-and-calling-japan-trains
2•downbad_•14m ago•0 comments

Is the peptide craze backed by science? The promise behind the hype

https://www.nature.com/articles/d41586-026-01816-x
3•Bender•15m ago•0 comments

Drones seized pilots cited near SoFi Stadium during World Cup security operation

https://ktla.com/news/local-news/drones-seized-pilots-cited-near-sofi-stadium-during-world-cup-se...
2•Bender•16m ago•0 comments

Swiss voters reject 10M population cap, early projections say

https://www.bbc.com/news/articles/c20ygjem17zo
3•7777777phil•17m ago•0 comments

Easy Open Source AI

https://github.com/Light-Heart-Labs/DreamServer
2•dreamserver•18m ago•0 comments

Mantic Think – Private bring-your-own-key Ollama UI with AI debates

https://manticthink.com/d/tq00dkq
2•Colewilliamz•21m ago•1 comments

Exchanges promised users in to SpaceX IPO. The tokenized shares never arrived

https://thenextweb.com/news/crypto-platforms-spacex-ipo-tokenized-stock-failed
2•JumpinJack_Cash•21m ago•0 comments

Ask HN: What are you working on? (June 2026)

5•david927•22m ago•3 comments

Tiny Solar Planner: plan small scale solar power systems

https://tiny-solar.space/
4•LNSY•24m ago•0 comments

Linux 7.1

https://lore.kernel.org/lkml/CAHk-=wi4BF4bMhZNZ1tqs+FFV4OuZRe3ZqdWB+LxRLmRweUzQw@mail.gmail.com/T/#u
8•berlianta•26m ago•0 comments

Scientists Discover Ancient 'Necropolis' Teeming with New Creatures

https://www.404media.co/scientists-discover-vast-ancient-necropolis-teeming-with-strange-new-crea...
1•Brajeshwar•26m ago•0 comments

Show HN: ComplyEdge – Runtime EU AI Act Enforcement for Python

https://github.com/ComplyEdge/complyedge
1•lc-complyedge•30m ago•0 comments

Ask HN: Is anyone building real software with AI agents?

2•variety8675•30m ago•3 comments

Show HN: Smartass – TypeScript test assertions with type-narrowing signatures

https://github.com/KensioSoftware/smartass/
1•xiuyuan•31m ago•0 comments

Ports and Adapters for Prose

https://blog.tacoda.dev/ports-and-adapters-for-prose-e53ec421925b
1•tacoda•31m ago•0 comments

Argus: Open-source AI coding assistant with built-in code review

https://github.com/ArgusTek/Argus
1•argustek•31m ago•0 comments