frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Kimi K2.6 just beat Claude, GPT-5.5, and Gemini in a coding challenge

https://thinkpol.ca/2026/04/30/an-open-weights-chinese-model-just-beat-claude-gpt-5-5-and-gemini-in-a-programming-challenge/
120•bazlightyear•1h ago

Comments

magicalhippo•1h ago
In a single challenge, measured by how performant the solution was.

Kimi K2.6 is definitely a frontier-sized model, so on the one hand it's not that surprising it's up there with the closed frontier models.

Being open is nice though, even though it doesn't matter that much for folks like me with a single consumer GPU.

echelon•1h ago
This is the future though. Open weights models that run on H200s provide far more opportunity to build products and real infrastructure around.

You can always distill this for your little RTX at home. But models shaped for consumer hardware will never win wide adoption or remain competitive with frontier labs.

This is something that _can_ compete. And it will both necessitate and inspire a new generation of open cloud infra to run inference. "Push button, deploy" or "Push button, fine tune" shaped products at the start, then far more advanced products that only open weights not locked behind an API can accomplish.

Now we just need open weights Nano Banana Pro / GPT Image 2, and Seedance 2.0 equivalents.

The battle and focus should be on open weights for the data center.

bitmasher9•49m ago
I don’t fully understand what open weights unlocks that cannot be accomplished via API from a product standpoint.

Open weights is great if you want to do additional training, or if you need on-prem for security.

stldev•39m ago
Or try to beat Anthropic's uptime.
mkl•37m ago
Multiple providers of the same model. That means competition for price, reliability, latency, etc. It also means you can use the same model as long as you want, instead of having it silently change behaviour.
echelon•11m ago
> Open weights is great if you want to do additional training, or if you need on-prem for security.

The power of giving universities, companies, and hackers "full" models should not be understated.

Here are a just a few ideas for image, video, and creative media models:

- Suddenly you're not "blocked" for entire innocuous prompts. This is a huge issue.

- You can fine tune the model to learn/do new things. A lighting adjustment model, a pose adjustment model. You can hook up the model to mocap, train it to generate plates, etc.

- You can fine tune it on your brand aesthetic and not have it washed out.

keyle•56m ago
It absolutely does matter.

The enshittification will go unnoticed at first but I'm already finding my favourite frontier models severely nerfed, doing incredibly dumb stuff they weren't in the past.

We need open weight models to have a stable "platform" when we rely on them, which we do more and more.

magicalhippo•47m ago
Most people won't roll out their own K2 deployment across rented GPUs, so in that sense it doesn't matter that much, they'll be using a paid service which is just as much of a black box as Claude or ChatGPT. For example, on OpenRouter you can select a provider which state they use a given open model, but you have no idea what actually goes on behind the curtains, which quantization levels they use and so on.

That said, I do fully agree that it is valuable to have open near-frontier models, as a balance to the closed ones.

slopinthebag•25m ago
It's not really a black box. Useful models becoming fungible is crucial for disincentivizing bad behaviour with model providers. I can't really overstate how different it is from relying on closed models. If you don't like or trust any of the providers on OpenRouter you can rent the GPUs yourself and host it, although this is probably unnecessary.
DeathArrow•52m ago
>Being open is nice though, even though it doesn't matter that much for folks like me with a single consumer GPU.

Of course it matters because that makes coding plans much cheaper than those from Anthropic and OpenAI.

For personal use I have coding plans with GLM 5.1, Kimi K2.6, MiniMax M2.7 and Xiaomi MiMo V2.5 Pro and I am getting a lot of bang for the buck.

magicalhippo•44m ago
Currently it's not a huge difference given the subsidies of closed model subscriptions. Once that stops then yea it will be really nice to have open models as price competitors.
PedroBatista•1h ago
Great to know, but what was the cost both in terms of $$ and tokens used?

Not to invalidate these benchmark results because they are useful, but the real usefulness it what they are capable to do when real people interact with them at scale.

Regardless, these are good news, because now that Microsoft is basically giving up their all-in strategy with Github's Copilot and Anthropic is playing the "I'm too good for you" game, it's about time for them to get pressed into not making this AI world into a divide between the haves and the have-nots.

keyle•54m ago
Re pricing. Never as high as frontier commercial models.
beering•1h ago
I’m a little confused as to the setup. It was asking each model to one-shot a script and then the scripts faced off? Were the models given a computer environment? Or a test server to iterate against?
rpmisms•1h ago
Sounds incredibly simple to me. One-shot.
beering•56m ago
So nothing like real-world coding, where you’d be able to run and test the script before submitting?
procinct•20m ago
One shot just means the user doesn’t have to iterate on it via the agent. The agent does what ever it needs to deliver the best outcome, including its own running and iteration until it’s happy with it. This could be a short or long process potentially depending on the task.
Frannky•1h ago
I have to try Kimi. I was looking for an alternative. If you have any experience, advice, please share. I saw Kimi is at the top of the Open Router ranking.
zorked•51m ago
I use Kimi at home via a kimi.com subscription and Kimi CLI (sometimes running inside Zed, sometimes not). My favorite model by far. And it's just $20.

I have to use a supposedly frontier model at work and I hate it.

Frannky•43m ago
Nice, thanks for sharing!
DeathArrow•49m ago
Kimi K2.6 is great but I advice you to get a coding plan from Kimi.com as that way is much cheaper than paying for API calls using OpenRouter.
Frannky•44m ago
Thanks, I am trying it right now. I had an opencode plan 5$/month, so I will play with that. I use ZED and I added Pi ACP, so I can try the both pi and Kimi. I will also try it in opencode and via Kimi code.
prvnsmpth•34m ago
Use kimi 2.6 for planning and a cheap model (preferably local) for execution, and then kimi once again for reviewing it. Then finally I review the code. Saves a lot on tokens.
Frannky•5m ago
Very interesting, thanks for sharing. I am testing it with Pi in Zed and it seems pretty good.
elromulous•1h ago
Is the site just slashdotted rn? Can anyone get to it?
brettgo1•18m ago
Slashdot... Now that's a name I haven't heard in a long time. A long time.
jakemanger•55m ago
What's the GPU VRAM requirements for this thing?

Awesome to have a open model that can compete, but damn it would be so much better if you could run it locally. Otherwise, it's almost so difficult to run (e.g. self host) that it's just way more convenient to pay OpenAI, Claude, etc

DeathArrow•47m ago
>Otherwise, it's almost so difficult to run (e.g. self host) that it's just way more convenient to pay OpenAI, Claude, etc

Getting a coding plan from Kimi.com will make coding 20x cheaper than using Anthropic.

BTW, I am using it with Claude Code.

slashdave•52m ago
I was surprised by the ranking, until I read what the test was. Not horribly relevant for coding.

The current ranking of all tests makes more sense (well, except for how well Gemini does)

https://aicc.rayonnant.ai

pbreit•46m ago
All my co-workers say Claude blows away Gemini. Is it really that good? How can I do Kimi?
prvnsmpth•37m ago
You can sign up for a plan on the kimi code platform and use it via the pi.dev coding agent, or opencode. In planning, I’d say it’s almost on par with Claude Opus.
justech•43m ago
I’ve been maining Kimi k2.6 through opencode go and openrouter for a week and I can say it’s the same experience as when I was maining Sonnet 3.5/4 late last year.

Not as good or as fast as Claude Code on Opus now but definitely enough for casual/hobby use. The best part is multiple choices for providers, if opencode gimps their service, I’ll switch

jrecyclebin•41m ago
I absolutely love Kimi's personality - some of the things it says are so out there! And it's been great for very focused, iterative work.

Its weakness is that it seems to yak on-and-on when it needs to plan out something big or read through and make sense of how to use a niche piece of a complex library. To the point where it can fill up its 256k window - and rack up a build. (No cache.) I have had better experience with GLM 5.1 in those cases.

Anyone out there relate?

anderber•33m ago
Absolutely. I use caveman to help with that: https://github.com/JuliusBrussee/caveman
jrecyclebin•28m ago
Not a bad idea - however

> Caveman only affects output tokens — thinking/reasoning tokens are untouched.

The problem is the thinking. But could help to tune my system prompt for Kimi.

LeoPanthera•26m ago
You can just add "be brief" to the prompt to replace the entire plugin. Same results.

https://www.maxtaylor.me/articles/i-benchmarked-caveman-agai...

gertlabs•34m ago
I'm glad we're seeing a shift towards objectively scored tests.

We've been doing this at scale at https://gertlabs.com/rankings, and although the author looks to be running unique one-off samples, it's not surprising to see how well Kimi K2.6 performed. Based on our testing, for coding especially, Kimi is within statistical uncertainty of MiMo V2.5 Pro for top open weights model, and performs much better with tools than DeepSeek V4 Pro.

GPT 5.5 has a comfortable lead, but Kimi is on par with or better than Opus 4.6. The problem with Kimi 2.6 is that it's one of the slower models we've tested.

veber-alex•15m ago
In my experience benchmarks are pretty meaningless.

Not only is performance dependent on the language and tasks gives but also the prompts used and the expected results.

In my own internal tests it was really hard to judge whether GPT 5.5 or Opus 4.7 is the better model.

They have different styles and it's basically up to preference. There where even times where I gave the win to one model only to think about it more and change my mind.

At the end of the day I think I slightly prefer Opus 4.7.

bazlightyear•13m ago
Are you tests and results open source?
refulgentis•11m ago
Any thoughts on using it on Fireworks? It's extremely fast there.
SomaticPirate•32m ago
This seems to be testing the models on leetcode style prompts that also require the model to implement TCP calls to send the results. Interesting but probably not a apples to apples comparison. The fact only Grok qualified for the first one seems suspect
aykutseker•30m ago
This seems less like Kimi is better at coding than Claude and more like Kimi found the right strategy for this particular game.

Still interesting though. The fact that an open weight model is close enough for that to matter is probably the real story.

rvz•23m ago
So we are now at the point where open weight models are rapidly catching up to the frontier models.

They are at best 30 days behind, and at worst case 2 months behind. The last issue is being able to run the best one on conventional hardware without a rack of GPUs.

The Macbooks, and Mac minis are behind on hardware but eventually in the next 2 years at worst will make it possible thanks to the advancements of the M-series machines.

All of this is why companies like Anthropic feel like they have to use "safety" to stop you from running local models on your machine and get you hooked on their casino wasting tokens with a slot machine named Claude.

qakajjqj•22m ago
Yes gimini is a programming application
0xbadcafebee•21m ago
These posts are going to be a constant for the next year, because there's no objective way to compare models (past low-level numbers like token generation speed, average reasoning token amount, # of parameters, active experts, etc). They're all quite different in a lot of ways, they're used for many different things by different people, and they're not deterministic. So you're constantly gonna see benchmarks and tests and proclamations of "THIS model beat THAT model!", with people racing around trying to find the best one.

But there is no best one. There's just the best one for you, based on whatever your criteria is. It's likely we'll end up in a "Windows vs MacOS vs Linux" style world, where people stick to their camps that do a particular thing a particular way.

slopinthebag•19m ago
Amazing. To me it feels like GLM 5.1, Kimi 2.6, DeepSeek 4 are all competitive both with each other and with the American models. Truly a great time to be alive.

I would like to see more effort making the flash variants work for coding. They are super economical to use to brute force boilerplate and drudgery, and I wonder just how good they can be with the right harness, if it provides the right UX for the steering they require.

As much as vibe coding has captured the zeitgeist, I think long term using them as tools to generate code at the hands of skilled developers makes more sense. Companies can only go so long spending obscene amounts of money for subpar unmaintainable code.

walrus01•13m ago
People thinking to self-host Kimi K2.6 had better be prepared for how big it is.

Q8 K XL quantization for instance is around 600GB on disk. I would bet about 700GB of VRAM needed.

Quantizations lower than Q8 are probably worthless for quality.

Or 2.05TB on disk for the full precision GGUF.

https://huggingface.co/unsloth/Kimi-K2.6-GGUF

If you can afford the hardware to run Kimi K2.6 at any decent speed for more than 1 simultaneous user, you probably have a whole team of people on staff who are already very familiar with how to benchmark it vs Claude, GPT-5.5, etc.

zozbot234•7m ago
Kimi is a natively quantized model, the lossless full precision release is 595GB. Your own link mentions that.
sieve•12m ago
Kimi is really good.

I have been using Sonnet and others (DeepSeek, ChatGPT, MiniMax, Qwen) for my compiler/vm project and the Claude Pro plan is mostly unusable for any serious coding effort. So I use it in chat mode in the browser where it cannot needlessly read your entire project, and use Kimi on the OpenCode Go plan with pi.

Kimi consistently exceeded Sonnet on the C+Python project. Never had to worry about it doing anything other than what I asked it to do. GLM crapped the bed once or twice. Kimi never did.

plexescor•9m ago
I always though claude is the goat, but i guess its time to change the notion and try Kimi K2.6
koala-news•4m ago
In my opinion, this kind of comparison is not very meaningful.

Kimi K2.6 just beat Claude, GPT-5.5, and Gemini in a coding challenge

https://thinkpol.ca/2026/04/30/an-open-weights-chinese-model-just-beat-claude-gpt-5-5-and-gemini-...
121•bazlightyear•1h ago•53 comments

Clandestine network smuggling Starlink tech into Iran to beat internet blackout

https://www.bbc.com/news/articles/cvgzk91leweo
122•1659447091•4h ago•56 comments

A Couple Million Lines of Haskell: Production Engineering at Mercury

https://blog.haskell.org/a-couple-million-lines-of-haskell/
125•unignorant•5h ago•45 comments

This Month in Ladybird - April 2026

https://ladybird.org/newsletter/2026-04-30/
239•richardboegli•9h ago•37 comments

Six Years Perfecting Maps on WatchOS

https://www.david-smith.org/blog/2026/04/29/maps-on-watchos/
236•valzevul•8h ago•50 comments

Dav2d

https://code.videolan.org/videolan/dav2d
411•dabinat•12h ago•119 comments

The IBM Granite 4.1 family of models

https://research.ibm.com/blog/granite-4-1-ai-foundation-models
30•wglb•2d ago•3 comments

Neanderthals ran 'fat factories' 125,000 years ago (2025)

https://www.universiteitleiden.nl/en/news/2025/07/neanderthals-ran-fat-factories-125000-years-ago
147•andsoitis•9h ago•52 comments

Do_not_track

https://donottrack.sh/
258•RubyGuy•12h ago•85 comments

Windows API Is Successful Cross-Platform API

https://retrocoding.net/windows-api-is-successful-cross-platform-api
48•phendrenad2•2h ago•28 comments

San Francisco streets with confusingly similar names

https://j-nelson.net/san-francisco-streets-with-similar-names/
8•SeenNotHeard•2d ago•7 comments

VS Code inserting 'Co-Authored-by Copilot' into commits regardless of usage

https://github.com/microsoft/vscode/pull/310226
1010•indrora•9h ago•485 comments

Inventions for battery reuse and recycling increase seven-fold in last decade

https://www.epo.org/en/news-events/news/inventions-battery-reuse-and-recycling-increase-more-seve...
179•JeanKage•2d ago•11 comments

The agent harness belongs outside the sandbox

https://www.mendral.com/blog/agent-harness-belongs-outside-sandbox
87•shad42•8h ago•69 comments

Maryland Is First to Ban A.I.-Driven Price Increases in Grocery Stores

https://www.nytimes.com/2026/05/01/business/surveillance-pricing-groceries-maryland.html
106•doener•4h ago•51 comments

Clojurists Together – Q2 2026 Open Source Funding Announcement

https://www.clojuriststogether.org/news/q2-2026-funding-announcement/
81•dragandj•8h ago•8 comments

A more efficient implementation of Shor's algorithm

https://lwn.net/Articles/1066156/
54•signa11•1d ago•5 comments

Care Homes and Hotels in Japan Shut as Expansion Strategy Unravels

https://www.newsonjapan.com/article/149075.php
15•mikhael•4h ago•2 comments

Show HN: State of the Art of Coding Models, According to Hacker News Commenters

https://hnup.date/hn-sota
81•yunusabd•8h ago•41 comments

A Physics Engine with Incremental Rollback for Multiplayer Games

https://easel.games/blog/2026-rollback-physics
69•BSTRhino•1d ago•22 comments

How fast is a macOS VM, and how small could it be?

https://eclecticlight.co/2026/05/02/how-fast-is-a-macos-vm-and-how-small-could-it-be/
237•moosia•20h ago•86 comments

When Dawkins met Claude – Could this AI be conscious?

https://unherd.com/2026/04/is-ai-the-next-phase-of-evolution/
16•pentestercrab•1d ago•87 comments

Simple and Correct Snapshot Isolation

https://remy.wang/blog/si.html
13•remywang•2d ago•1 comments

Open source does not imply open community

https://blog.feld.me/posts/2026/04/open-source-does-not-imply-open-community/
112•RohanAdwankar•3h ago•24 comments

Voice-AI-for-Beginners – A curated learning path for developers

https://github.com/mahimairaja/voiceai
61•mahimai•7h ago•4 comments

Dabbling in Erlang, part 2: A minimal introduction (2013)

https://agis.io/post/dabbling-in-erlang-a-minimal-introduction/
23•pasxizeis•21h ago•2 comments

NetHack 5.0.0

https://nethack.org/v500/release.html
420•rsaarelm•11h ago•130 comments

Barman – Backup and Recovery Manager for PostgreSQL

https://github.com/EnterpriseDB/barman
148•nateb2022•3d ago•23 comments

Little Magazines Are Back

https://wsjfreeexpression.substack.com/p/little-magazines-are-back
80•prismatic•2d ago•28 comments

The USB Situation

https://randsinrepose.com/archives/the-usb-situation/
113•herbertl•3d ago•125 comments