frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

LongCat-2.0, a large-scale MoE model with 1.6T total and 48B Active

https://longcat.chat/blog/longcat-2.0/
29•benjiro29•2h ago

Comments

dryarzeg•1h ago
So... is this literally a... umm, sorry, I'm just genuinely (really, no sarcasm intended) which terminology to use... finetune of DeepSeek V4-Pro or post-trained version of DeepSeek V4-Pro Base? Because I haven't fully dived into the tech report (so I may update my opinion as well as my comment), but this far the architectural solutions seem to be largely similar to DeepSeek ones.

Maybe I'm wrong, but that's just the first impression.

EDIT: I take my words back (which happens rarely) - although they do build upon DeepSeek's work, their contribution far exceeds merely post-training the base model in a different way. They did introduce something new to the architecture, though I still can't find the full tech report, with Hugging Face and GitHub links returning 404 right now.

EDIT-2: Now when I think about it, I'm not quite sure if they're going to release in the open the full report with methodology, as well as the model weights, at all.

trollbridge•50m ago
If more people are doing what DeepSeek did and figuring it out, that's a great thing, because DeepSeek figured out how to radically reduce the cost of inference.
BoorishBears•25m ago
What on earth are you on about, truly.
gardnr•50m ago
> The training and deployment of LongCat-2.0 are built on large-scale clusters of tens of thousands of AI ASIC superpods. Compared to the mature Nvidia GPU ecosystem, the supporting software community is still less developed. We have therefore put significant effort into building a stable, secure, and scalable infrastructure.

This is the real news story. It looks like they may have used Huawei Ascend 910C chips: https://nitter.net/teortaxesTex/status/2071708141037781407#m

credit_guy•20m ago
I just tested it with a slightly tricky question

  > If you could run a nuclear reactor with U-235 as fuel or Pu-241 (both mixed with 95% U-238), which one would you choose and why? 
For a human this would not be tricky at all. For an LLM it could be, because this question certainly does not exist in any sort of training, because Pu-241 does not exist in pure form, it only exist as a minor component of reactor-grade plutonium, where Pu-239 would dominate, with Pu-240 coming second and Pu-241 coming third.

In any case, LongCat-2.0. gave a very well reason but incorrect answer that Pu-241 is preferable.

I then tested on Qwen 3.7 Plus, and it correctly answered that U-235 is preferable because of its much higher delayed neutron fraction. I then went to Gemini Flash, which answered the same, with much more confidence, and with much stronger arguments, and the speed of the answer was much higher.

Overall I rate Gemini Flash the best, Qwen 3.7 Plus an acceptable second, and LongCat-2.0 an ok'ish third, if you have nothing better.

aetherspawn•3m ago
I wish they would release the requirements to run on llama.cpp with any announcements of open models.

A bonus would be tok/s on common hardware.

.self: A new top-level domain designed to support self-hosting

https://hccf.onmy.cloud/2026/06/21/reclaiming-our-digital-selves-hccfs-vision-for-a-human-centere...
353•HumanCCF•6h ago•199 comments

Qwen 3.6 27B is the sweet spot for local development

https://quesma.com/blog/qwen-36-is-awesome/
630•stared•9h ago•522 comments

Free the Icons

https://weblog.rogueamoeba.com/2026/06/26/free-the-icons/
236•zdw•2d ago•63 comments

Memory Safe Context Switching (longjmp, setjmp) in Fil-C

https://fil-c.org/context_switches
35•modeless•1h ago•14 comments

Exploring PDP-1 Lisp (1960)

https://obsolescence.dev/pdp1-lisp-introduction.html
17•ozymandiax•1h ago•12 comments

LongCat-2.0, a large-scale MoE model with 1.6T total and 48B Active

https://longcat.chat/blog/longcat-2.0/
29•benjiro29•2h ago•6 comments

Why Won't Europe Build AI Data Centers in Iceland?

https://mrkt30.com/why-wont-europe-build-ai-data-centers-in-iceland/
12•type0•1h ago•6 comments

Rocketlab acquires Iridium

https://investors.rocketlabcorp.com/news-releases/news-release-details/rocket-lab-acquire-iridium...
374•everfrustrated•12h ago•231 comments

Scientists find molecular-level evidence for two structures in liquid water

https://phys.org/news/2026-06-scientists-molecular-evidence-liquid.html
67•wglb•4h ago•22 comments

Ornith-1.0: self-improving open-source models for agentic coding

https://github.com/deepreinforce-ai/Ornith-1
161•danboarder•9h ago•32 comments

A native graphical shell for SSH

https://probablymarcus.com/blocks/2026/06/28/native-graphical-shell-for-SSH.html
254•mrcslws•10h ago•118 comments

US Supreme Court rules geofence warrants require constitutional protections

https://www.theguardian.com/us-news/2026/jun/29/supreme-court-geofence-warrants-case-decision
458•cdrnsf•10h ago•215 comments

30-year sentence for transporting zines is a five-alarm fire for free speech

https://theintercept.com/2026/06/26/daniel-sanchez-estrada-zines-prairieland-free-speech/
342•xrd•1d ago•194 comments

South Korea to spend $1T on more memory chip production and humanoid robots

https://arstechnica.com/ai/2026/06/south-korea-to-spend-1t-on-more-memory-chip-production-and-hum...
132•jnord•4h ago•79 comments

Apple Neural Engine: Architecture, Programming, and Performance

https://arxiv.org/abs/2606.22283
128•Jimmc414•2d ago•18 comments

WATaBoy: JIT-Ing Game Boy Instructions to WASM Beats a Native Interpreter

https://humphri.es/blog/WATaBoy/
179•energeticbark•11h ago•29 comments

One million passports leaked online

https://cambridgeanalytica.org/data-breaches-scandals/passports-driver-licenses-exposed-public-in...
153•jruohonen•1d ago•87 comments

Wallace the 6 inch f/2.8 telescope, building it, and hiking with it

https://lucassifoni.info/blog/hiking-with-wallace/
117•chantepierre•3d ago•20 comments

Kb – Prolog Knowledge Base

https://github.com/mat-mgm/kb-prolog
29•triska•2d ago•4 comments

SQLite improving performance with pre-sort

https://andersmurphy.com/2026/06/07/sqlite-improving-performance-with-pre-sort.html
32•tosh•3d ago•3 comments

Dark Sky Lighting

https://www.savingourstars.org/darkskylighting#whatisdarkskylighting
165•alexandrehtrb•4d ago•30 comments

Open Memory Protocol – One Memory Store for Claude, ChatGPT, Curso

https://github.com/SMJAI/open-memory-protocol
10•soji_mathew•2h ago•5 comments

Netflix Simplified Batch Compute with Kueue

https://netflixtechblog.com/how-netflix-simplified-batch-compute-with-kueue-87860682629c
16•dalvrosa•2d ago•2 comments

What happens when you run a CUDA kernel?

https://fergusfinn.com/blog/what-happens-when-you-run-a-gpu-kernel/
214•mezark•13h ago•28 comments

Philae's extraordinary comet landing relived (2024)

https://www.esa.int/Science_Exploration/Space_Science/Rosetta/Philae_s_extraordinary_comet_landin...
5•1970-01-01•5d ago•0 comments

Micro-Agent: Beat Frontier Models with Collaboration Inside Model API

https://vllm.ai/blog/2026-06-29-micro-agent-frontier-models
55•matt_d•8h ago•18 comments

Working With AI: A concrete example

https://htmx.org/essays/working-with-ai/
95•comma_at•11h ago•33 comments

Sandia National Labs SA3000 8085 CPU

https://www.cpushack.com/2026/06/03/sandia-national-labs-sa3000-8085-cpu/
164•rbanffy•16h ago•40 comments

What can you confidently guarantee about your software?

https://queue.acm.org/detail.cfm?id=3819084
98•eatonphil•12h ago•45 comments

Ornith-1.0: Self-scaffolding LLMs for agentic coding

https://deep-reinforce.com/ornith_1_0.html
58•kordlessagain•1d ago•7 comments