frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Preserving Traditions: Unveiling the Timeless History of Lacto-Fermentation

https://www.lazyscientistsauces.co.uk/post/preserving-traditions-unveiling-the-timeless-history-of-lacto-fermentation
1•thunderbong•17s ago•0 comments

Global Measles Outbreaks

https://www.cdc.gov/global-measles-vaccination/data-research/global-measles-outbreaks/index.html
2•andsoitis•1m ago•1 comments

Show HN: SaaS Template Optimized for AI

https://github.com/TeemuSo/saas-template-for-ai-lite
1•TeemuSo•3m ago•0 comments

Flux Kontext Image editing tests

https://www.flickspeed.ai/canvas/public/6871319e239a5c68830ee64f
1•taherchhabra•5m ago•1 comments

How to Interview AI Engineers

https://blog.promptlayer.com/the-agentic-system-design-interview-how-to-evaluate-ai-engineers/
1•jzone3•6m ago•2 comments

Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs

https://arxiv.org/abs/2504.06219
1•layer8•7m ago•0 comments

Creating a Website from Obsidian

https://lwgrs.bearblog.dev/creating-a-website-from-obsidian/
2•speckx•7m ago•0 comments

Talking Postgres with Shireesh Thota, Microsoft CVP

https://talkingpostgres.com/episodes/how-i-got-started-leading-database-teams-with-shireesh-thota/transcript
1•clairegiordano•9m ago•0 comments

Pasilalinic-Sympathetic Compass

https://en.wikipedia.org/wiki/Pasilalinic-sympathetic_compass
1•frabert•9m ago•0 comments

Ask HN: Advice for someone choosing a college path

2•spacebuffer•11m ago•1 comments

Chinese TV uses AI to translate broadcasts to sign language. It's not going well

https://www.theregister.com/2025/07/10/china_ai_sign_language_translation/
1•xbmcuser•11m ago•0 comments

Do Longevity Drugs Work?

https://www.economist.com/science-and-technology/2025/06/20/do-longevity-drugs-work
1•bookofjoe•14m ago•1 comments

I created an open source AI first Kanban tool

https://vibecodementor.net/kanban
1•wavh•17m ago•1 comments

Bela Gem Brings Ultra-Low Latency Audio to PocketBeagle 2

https://www.beagleboard.org/blog/2025-07-10-bela-gem-brings-ultra-low-latency-audio-to-pocketbeagle-2
1•ofalkaed•17m ago•0 comments

Hunting Russian Spies in Norway's 'Spy Town' [video]

https://www.youtube.com/watch?v=KcVxl08XYzQ
2•mgl•18m ago•0 comments

I'm more proud of these 128 kilobytes than anything I've built since

https://medium.com/@mikehall314/im-more-proud-of-these-128-kilobytes-than-anything-i-ve-built-since-53706cfbdc18
2•mikehall314•19m ago•0 comments

Once-in-a-Generation Copper Trade Upends a $250B Market

https://www.bloomberg.com/news/features/2025-07-11/trump-s-copper-tariffs-deadline-marks-end-of-once-in-a-generation-trade
1•mgl•20m ago•1 comments

SSPL is BAD

https://ssplisbad.com/
2•lr0•23m ago•1 comments

Krafton slams ex-Subnautica 2 execs – who now say they're suing

https://www.theverge.com/news/704606/subnautica-2-delay-krafton-unknown-worlds-bonus
2•mrkeen•24m ago•0 comments

Show HN: Prepin just launched 15 interview categories for mock interviews

1•OlehSavchuk•26m ago•0 comments

Stages of Adoption

https://www.robertotonino.com/adoption
1•RobTonino•26m ago•0 comments

A New Kind of AI Model Lets Data Owners Take Control

https://www.wired.com/story/flexolmo-ai-model-lets-data-owners-take-control/
1•CharlesW•27m ago•0 comments

xAI seeks up to $200B valuation in next fundraising

https://www.ft.com/content/25aab987-c2a1-4fca-8883-38a617269b68
2•mfiguiere•35m ago•0 comments

Synthetic renewable methane production via reactive CO2 capture and conversion

https://www.sciencedirect.com/science/article/pii/S2949790625001041
1•PaulHoule•40m ago•0 comments

Solar became EU's largest source of electricity in June 2025

https://ember-energy.org/latest-insights/solar-is-eus-biggest-power-source-for-the-first-time-ever/
1•dotcoma•40m ago•0 comments

New AWS Free Tier Launching July 15

https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/free-tier.html
1•firstSpeaker•41m ago•0 comments

Bujo.nvim – bullet journal accessible from anywhere

https://github.com/timhugh/bujo.nvim
1•timhugh•44m ago•1 comments

Placing Functions

https://blog.yoshuawuyts.com/placing-functions/
2•todsacerdoti•44m ago•0 comments

Moonshotai/Kimi-K2-Instruct

https://simonwillison.net/2025/Jul/11/kimi-k2/
2•nickthegreek•46m ago•0 comments

A Match Made in the Heavens: The Surveillance State and the "New Space" Economy

https://www.techpolicy.press/a-match-made-in-the-heavens-the-surveillance-state-and-the-new-space-economy/
1•gnabgib•47m ago•0 comments
Open in hackernews

Kimi K2

https://twitter.com/Kimi_Moonshot/status/1943687594560332025
95•c4pt0r•4h ago

Comments

gs17•4h ago
> 1T total / 32B active MoE model

Is this the largest open-weight model?

bigeagle•4h ago
I believe so.

Grok-1 is 341B, DeepSeek-v3 is 671B, and recent new open weights models are around 70B~300B.

simonw•4h ago
Big release - https://huggingface.co/moonshotai/Kimi-K2-Instruct model weights are 958.52 GB
c4pt0r•4h ago
Paired with programming tools like Claude Code, it could be a low-cost/open-source replacement for Sonnet
kkzz99•3h ago
According to the bench its closer to Opus, but I venture primarily for English and Chinese.
martin_•3h ago
how do you low cost run a 1T param model?
maven29•3h ago
32B active parameters with a single shared expert.
JustFinishedBSG•3h ago
This doesn’t change the VRAM usage, only the compute requirements.
maven29•3h ago
You can probably run this on CPU if you have a 4090D for prompt processing, since 1TB of DDR4 only comes out to around $600.

For GPU inference at scale, I think token-level batching is used.

t1amat•3h ago
With 32B active parameters it would be ridiculously slow at generation.
selfhoster11•59m ago
DDR3 workstation here - R1 generates at 1 token per second. In practice, this means that for complex queries, the speed of replying is closer to an email response than a chat message, but this is acceptable to me for confidential queries or queries where I need the model to be steerable. I can always hit the R1 API from a provider instead, if I want to.

Given that R1 uses 37B active parameters (compared to 32B for K2), K2 should be slightly faster than that - around 1.15 tokens/second.

zackangelo•2h ago
Typically a combination of expert level parallelism and tensor level parallelism is used.

For the big MLP tensors they would be split across GPUs in a cluster. Then for the MoE parts you would spread the experts across the GPUs and route to them based on which experts are active (there would likely be more than one if the batch size is > 1).

selfhoster11•1h ago
It does not have to be VRAM, it could be system RAM, or weights streamed from SSD storage. Reportedly, the latter method achieves around 1 token per second on computers with 64 GB of system RAM.

R1 (and K2) is MoE, whereas Llama 3 is a dense model family. MoE actually makes these models practical to run on cheaper hardware. DeepSeek R1 is more comfortable for me than Llama 3 70B for exactly that reason - if it spills out of the GPU, you take a large performance hit.

If you need to spill into CPU inference, you really want to be multiplying a different set of 32B weights for every token compared to the same 70B (or more) instead, simply because the computation takes so long.

refulgentis•43m ago
The amount of people who will be using it at 1 token/sec because there's no better option, and have 64 GB of RAM, is vanishingly small.

IMHO it sets the local LLM community when we lean on extreme quantization & streaming weights from disk to say something is possible*, because when people try it out, it turns out it's an awful experience.

* the implication being, anything is possible in that scenario

cyanf•3h ago
This is both the largest oss model release thus far, and the largest Muon training run.
wiradikusuma•2h ago
I've only started using Claude, Gemini, etc in the last few months (I guess it comes with age, I'm no longer interested in trying the latest "tech"). I assume those are "non-agentic" models.

From reading articles online, "agentic" means like you have a "virtual" Virtual Assistant with "hands" that can google, open apps, etc, on their own.

Why not use existing "non-agentic" model and "orchestrate" them using LangChain, MCP etc? Why create a new breed of model?

I'm sorry if my questions sound silly. Following AI world is like following JavaScript world.

ozten•1h ago
It is not a silly question. The various flavors of LLM have issues with reliability. In software we expect five 9s, LLMs aren't even a one 9. Early on it was reliability of them writing JSON output. Then instruction following. Then tool use. Now it's "computer use" and orchestration.

Creating models for this specific problem domain will have a better chance at reliability, which is not a solved problem.

Jules is the gemini coder that links to github. Half the time it doesn't create a pull request and forgets and assumes I'll do some testing or something. It's wild.

simonw•1h ago
"Agentic" and "agent" can mean pretty much anything, there are a ton of different definitions out there.

When an LLM says it's "agentic" it usually means that it's been optimized for tool use. Pretty much all the big models (and most of the small ones) are designed for tool use these days, it's an incredibly valuable feature for a model to offer.

I don't think this new model is any more "agentic" than o3, o4-mini, Gemini 2.5 or Claude 4. All of those models are trained for tools, all of them are very competent at running tool calls in a loop to try to achieve a goal they have been given.

dcre•1h ago
Reasonable question, simple answer: "New breed of model" is overstating it — all these models for years have been fine-tuned using reinforcement learning on a variety of tasks, it's just that the set of tasks (and maybe the amount of RL) has changed over time to include more tool use tasks, and this has made them much, much better at the latter. The explosion of tools like Claude Code this year is driven by the models just being more effective at it. The orchestration external to the model you mention is what people did before this year and it did not work as well.
selfhoster11•55m ago
> I'm sorry if my questions sound silly. Following AI world is like following JavaScript world.

You are more right than you could possibly imagine.

TL;DR: "agentic" just means "can call tools it's been given access to, autonomously, and then access the output" combined with an infinite loop in which the model runs over and over (compared to a one-off interaction like you'd see in ChatGPT). MCP is essentially one of the methods to expose the tools to the model.

Is this something the models could do for a long while with a wrapper? Yup. "Agentic" is the current term for it, that's all. There's some hype around "agentic AI" that's unwarranted, but part of the reason for the hype is that models have become better at tool calling and using data in their context since the early days.

simonw•1h ago
Pelican on a bicycle result: https://simonwillison.net/2025/Jul/11/kimi-k2/
_alex_•1h ago
wow!
MaxPock•1h ago
Would be hilarious if Zuck with his billion dollar poaching failed to beat budget Chinese models.
aliljet•1h ago
If the SWE Bench results are to be believed... this looks best in class right now for a local LLM. To be fair, show me the guy who is running this locally...
selfhoster11•1h ago
It's challenging, but not impossible. With 2-bit quantisation, only about 250-ish gigabytes of RAM is required. It doesn't have to be VRAM either, and you can mix and match GPU+CPU inference.

In addition, some people on /r/localLlama are having success with streaming the weights off SSD storage at 1 token/second, which is about the rate I get for DeepSeek R1.

helloericsf•40m ago
How does it stack up against the new Grok 4 model?