Kimi K2.6: Advancing Open-Source Coding

https://www.kimi.com/blog/kimi-k2-6

167•meetpateltech•1h ago

Comments

irthomasthomas•1h ago

Beats opus 4.6! They missed claiming the frontier by a few days.

NitpickLawyer•1h ago

While I'm skeptical of any "beats opus" claims (many were said, none turned out to be true), I still think it's insane that we can now run close-to-SotA models locally on ~100k worth of hardware, for a small team, and be 100% sure that the data stays local. Should be a no-brainer for teams that work in areas where privacy matters.

cedws•55m ago

Even the smaller quantized models which can run on consumer hardware pack in an almost unfathomable amount of knowledge. I don't think I expected to be able to run a 'local Google' in my lifetime before the LLM boom.

osti•33m ago

I think this one is only about 600GB VRAM usage, so it could fit on two mac studios with 512GB vram each. That would have costed (albeit no longer available) something like less than 20k.

NitpickLawyer•28m ago

Yeah, but that's personal use at best, not much agentic anything happening on that hardware. Macs are great for small models at small-medium context lengths, but at > 64k (something very common with agentic usage) it struggles and slows down a lot.

The ~100k hardware is suitable for multi-user, small team usage. That's what you'd use for actual work in reasonable timeframes. For personal use, sure macs could work.

zozbot234•6m ago

You could run it with SSD offload, earlier experiments with Kimi 2.5 on M5 hardware had it running at 2 tok/s. K2.6 has a similar amount of total and active parameters.

BoorishBears•1h ago

Opus is clearly a sidegrade meant to help Anthropic manage cost, so I would say they may have it if it actually beats 4.6

irthomasthomas•55m ago

Could be right. I just noticed my feed is absent the usual flood of posts demoing the new hotness on 3D modeling, game design and SVG drawings of animals on vehicles.

pixel_popping•24m ago

It doesn't beat Opus 4.6, no way, don't be fooled by benchmarks.

nickandbro•1h ago

Wow, if the benchmarks checkout with the vibes, this could almost be like a Deepseek moment with Chinese AI now being neck and neck with SOTA US lab made models

motoboi•59m ago

With the previous generation? Yes. With 10T mythos-level models? Not even close.

bestouff•56m ago

There's no public data about Mytho.

maplethorpe•52m ago

That's because it would be too dangerous to release.

nisegami•43m ago

So is my P=NP proof.

cedws•43m ago

My girlfriend goes to a different school, you wouldn't know her.

squarefoot•35m ago

Same for teleport, time travel and warp drive.

amazingamazing•52m ago

The psyop continues. Mythos until it’s released is vaporware. Notice how you can try kimi 2.6. Where is the same for mythos?

ChrisLTD•51m ago

Mythos isn't the current generation, it's literally vaporware.

jollymonATX•47m ago

According to the benchmarks, you are wrong. It is on track and slightly above some sota. Just the benchmarks speaking there, they can be/are gamed by all big model labs including domestic.

irthomasthomas•44m ago

10T? Impossible! They told us the training run was under 10^26 flops.

lbreakjai•9m ago

I've got a 12T model on my machine, built it myself. It's called Mytho. Too dangerous to even release a fact sheet about it. It can hack into the mainframe, enhance ultra-compressed images, grow your hair back, and make people fall in love with you.

ai_fry_ur_brain•10m ago

Its not anywhere close, and if it was nobody in the USA would be spending 7 figures on infrastructure for it.

You LLM people all here serious cases of Dunning Kruger

otabdeveloper4•3m ago

> Its not anywhere close

Close to what, and how are you measuring?

> nobody in the USA would be spending 7 figures on infrastructure for it

Au contraire, if AI had a moat it would pay for itself. They're funneling capital into infrastructure because they know it can't.

swingboy•1h ago

Exciting benchmarks if true. What kind of hardware do they typically run these benchmarks on? Apologies if my terminology is off, but I assume they're using an unquantized version that wouldn't run on even the beefiest MacBook?

esafak•59m ago

K2.5 was already pretty decent so I would try this. Starting at $15/month: https://www.kimi.com/membership/pricing

edit: Note that you can run it yourself or access it from other providers too: https://openrouter.ai/moonshotai/kimi-k2.6/providers

wg0•47m ago

How are the usage limits compared to Anthropic?

greenavocado•39m ago

Anthropic has the worst usage limits in the industry

pbowyer•25m ago

What's the privacy/data security like? I can't find that on that page.

Edit: found it.

> We may use your Content to operate, maintain, improve, and develop the Services, to comply with legal obligations, to enforce our policies, and to ensure security. You may opt out of allowing your Content to be used for model improvement and research purposes by contacting us at membership@moonshot.ai. We will honor your choice in accordance with applicable law.

Section 3 of https://www.kimi.com/user/agreement/modelUse?version=v2

pixel_popping•16m ago

You really rely on ToS from Anthropic/OpenAI to know if they use your prompts or not? It's on their servers, why wouldn't they use our data?

lbreakjai•51m ago

I have a subscription through work, I've been trialing it, so far it looks on par, if not better, than opus.

verdverm•49m ago

https://huggingface.co/moonshotai/Kimi-K2.6

Is this the same model?

Unsloth quants: https://huggingface.co/unsloth/Kimi-K2.6-GGUF

(work in progress, no gguf files yet, header message saying as much)

Balinares•33m ago

Quite curious how well real usage will back the benchmarks, because even if it's only Opus ballpark, open weights Opus ballpark is seismic.

pt9567•49m ago

wow - $0.95 input/$4 output. If its anywhere near opus 4.6 that's incredible.

corlinp•40m ago

This should erase any doubt that AI Labs are making $$$ on API inference.

Kimi 2.5 (which this is based on) is served at $0.44 input / $2 output by a ton of different providers on OpenRouter, 2.6 will certainly be similar.

That's about 11X less than Opus for similar smarts.

Lalabadie•30m ago

Famously, OpenAI and Anthropic are devoted to increasing efficiency before scaling up resource usage.

amazingamazing•26m ago

How does it erase any doubt? You’re implying Chinese things can’t be actually cheaper to produce than American which is laughable

greenavocado•48m ago

I pray the benchmark figures are true so I can stop paying Anthropic after screwing me over this quarter by dumbing down their models, making usage quotas ridiculously small, and demanding KYC paperwork.

jollymonATX•46m ago

Anthropic has done horrible PR and investors should be livid.

greenavocado•45m ago

My theory is they pushed retail off their systems to make room for their new corporate fat cat clients. In which case, they'll do just fine.

elfbargpt•43m ago

I've always been surprised Kimi doesn't get more attention than it does. It's always stood out to me in terms of creativity, quality... has been my favorite model for awhile

regularfry•39m ago

Dirt cheap on openrouter for how good it is, too. Really hoping that 2.6 carries on that tradition.

varispeed•22m ago

Maybe because it's a bit of like unleashing a chaos monkey on your codebase? I tried it locally (K2.5 72B) and couldn't get anything useful.

KaoruAoiShiho•21m ago

Huh, that's not a thing?

johndough•14m ago

The parent poster is probably referring to Kimi-Dev-72B¹, which is a much smaller and older model, while people are probably more familiar with the big and fairly powerful 1100B Kimi-K2.5².

[1] https://huggingface.co/moonshotai/Kimi-Dev-72B

[2] https://huggingface.co/moonshotai/Kimi-K2.5

culi•21m ago

It's also one of the few models that seem capable of drawing an SVG clock

https://clocks.brianmoore.com/

sigmoid10•2m ago

Is it? In your link it definitely failed to draw the clock.

twotwotwo•4m ago

Kagi has it as an option in its Assistant thing, where there is naturally a lot of searching and summarizing results. I've liked its output there and in general when asked for prose that isn't in the list/Markdown-heavy "LLM style." It's hard to do a confident comparison, but it's seemed bold in arranging the output to flow well, even when that took surgery on the original doc(s). Sometimes the surgery's needed e.g. to connect related ideas the inputs treated as separate, or to ensure it really replies to the request instead of just dumping info that's somehow related to it.

game_the0ry•43m ago

There is some humor in the fact that china (of all countries) is pioneering possibly the world's most important tech via open source, while we (US) are doing the exact opposite.

osti•35m ago

Maybe open source == communism

darkwater•28m ago

Good ol' Steve "Developers! Developers! Developers!" Ballmer said so a long time ago. What a visionary!

konart•8m ago

But China is not communist event though the rulling party the word in its name.

culi•13m ago

All great technological advancements have come through opening up technology. Just look at your iPhone. GPS, the internet, AI voice assistants, touchscreens, microprocessors, lithium-ion batteries, etc all came from gov't research (I'm counting Bell Labs' gov't mandated monopoly + research funding as gov't) that was opened up for free instead of being locked behind a patent.

Private companies will never open up a technological breakthrough to their competitors. It just doesn't make sense. If you want an entire field to advance, you have to open it up.

nashadelic•9m ago

additional humor is the open in openai

nisegami•42m ago

The choice of example task for Long-Horizon Coding is a bit spooky if you squint, since it's nearing the territory of LLMs improving themselves.

Banditoz•41m ago

If the benchmarks are private, how do we reproduce the results? I looked up the Humanity's Last Exam (https://agi.safe.ai/) this model uses and I can't seem to access it.

mariopt•36m ago

Really excited to try this one, I've been using kimi 2.5 for design and it's really good but borderline useless on backend/advanced tasks.

Also discovered that using OpenCode instead of the kimi cli, really hurts the model performance (2.5).

oliver236•35m ago

isnt this better than qwen?

simonw•13m ago

Accessed via OpenRouter, this one decided to wrap the SVG pelican in HTML with controls for the animation speed: https://gisthost.github.io/?ecaad98efe0f747e27bc0e0ebc669e94...

Transcript and HTML here: https://gist.github.com/simonw/ecaad98efe0f747e27bc0e0ebc669...

SwellJoe•4m ago

We got an overachiever, here. Kimi sounds like a teacher's pet kind of name.

dmix•11m ago

I'm pretty Kimi is what Cursor uses for their "composer 2" model. Works pretty good as a fallback when Claude runs out, but definitely a downgrade.

fintechie•11m ago

Gonna give this one a go... the previous 2.5 model is used for Cursor's Composer 2 Fast. After real world tasks during a few weeks I have seen that it can be very dumb or it can be very good (better than Opus 4.7) depending on the problem you throw at it.

Sometimes in one single pass prompt/response can unblock you in issues where Opus ate $100+ in API credits and circled during hours. Other times the response is useless, but it is your responsibility as engineer to discern this.

Verdict (at least for me): use both.

cassianoleal•11m ago

If only their API wasn't tied to a Google or phone login...

cmrdporcupine•10m ago

Running it through opencode to their API and... it definitely seems like it's "overthinking" -- watching the thought process, it's been going for pages and pages and pages diagnosing and "thinking" things through... without doing anything. Sitting at 50k+ output tokens used now just going in thought circles, complete analysis paralysis.

Might be a configuration or prompt issue. I guess I'll wait and see, but I can't get use out of this now.

m4rkuskk•8m ago

I have been testing it in my app all morning, and the results line up with 4.6 Sonnet. This is just a "vibe" feeling with no real testing. I'm glad we have some real competition to the "frontier" models.

Hack Monty, Win $5k: Inside PydanticAI's Challenge

Laz's Wolfenstein 3D Page

Colorado River disappeared record for 5M years: now we know where it was

Code Is the New Assembly

The Download: murderous 'mirror' bacteria, and Chinese workers fighting AI doub

OpenData Timeseries: Prometheus-compatible metrics on object storage

The AI engineering stack we built internally – on the platform we ship

Show HN: My Hyperliquid Terminal

H.R.8250 – Parents Decide Act

Cute Matrix Transpose

Show HN: Noise.widgita.xyz – a zero-back end noise map for anywhere in the world

Show HN: Hora – A Native SwiftUI Google Calendar Client for macOS

Contact Lens Uses Microfluidics to Monitor and Treat Glaucoma

The Theory of Interstellar Trade [pdf]

Show HN: Reproducible benchmark – OpenAI charges 1.5x-3.3x more for non-English

I Like the Web They Want

Labor Automation Forecasting Hub on Metaculus Measures Impact of AI on Labor

DeWitt Clauses

Show HN: Enlist AI: A tool that turns any job description into a study plan

Efficiently Transfer Files to LibreOffice Calc: A Step-by-Step Guide

Transitioning from Corporate to Open Source at 23 y.o

What we once had (at the height of the XMPP era of the Internet) (2023)

Agent-consistency – a Python consistency layer for multi-agent workflows

Modern Board Games: and why you should play them (2022)

Scaling Claude beyond individual workflows – lessons from our team

Language Modeling Without Neural Networks

Power tools got worse on purpose

A New Chapter for Ruby Central

Quantum Computers Are Not a Threat to 128-Bit Symmetric Keys

Show HN: Open-source alternative HN front page with point highlights and search