frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Tongyi DeepResearch – open-source 30B MoE Model that rivals OpenAI DeepResearch

https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research/
187•meander_water•9h ago

Comments

jychang•7h ago
This is over a month old, they released the weights a long time ago.
earthnail•6h ago
And for those not so tightly in the loop: how does it compare?
jwr•2h ago
That's OK — not all of us follow all the progress on a daily basis, and a model that is a month old doesn't become useless just by being a month old!
embedding-shape•6h ago
Isn't OpenIA "Deep research" (not "DeepResearch") a methodology/tooling thing, and you'll get different responses depending on what specific model you use with it? As far as the UI allows you to, you could use Deep research with GPT-5, GPT-4o, o3 and so on, and that'll have an impact on the responses. Skimming the paper and searching for some simple terms makes it seem like they never expand on what exact models they've used, just that they've used a specific feature from ChatGPT?
simonw•4h ago
At this point "deep research" is more of a pattern - OpenAI and Perplexity and Google Gemini all offer products with that name which work essentially the same way, and Anthropic and Grok have similar products with a slightly different name attached.

The pattern is effectively long-running research tasks that drive a search tool. You give them a prompt, they churn away for 5-10 minutes running searches and they output a report (with "citations") at the end.

This Tongyi model has been fine-tuned to be really good at using its search tool in a loop to produce a report.

embedding-shape•3h ago
Yes, but I think my previous point still matter, namely what exact model is being used greatly affects the results.

So without specifying which model is being used, it's really hard to know what is better than something else, because we don't understand what the underlying model is, and if it's better because of the model itself, or the tooling, which feels like an important distinction.

aliljet•5h ago
Sunday morning, and I find myself wondering how the engineering tinkerer is supposed to best self-host these models? I'd love to load this up on the old 2080ti with 128gb of vram and play, even slowly. I'm curious what the current recommendation on that path looks like.

Constraints are the fun part here. I know this isn't the 8x Blackwell Lamborghini, that's the point. :)

homarp•5h ago
llama.cpp gives you the most control to tune it for your machine.
giobox•4h ago
If you just want to get something running locally as fast as possible to play with (the 2080ti typically had 11gb of VRAM which will be one of the main limiting factors), the ollama app will run most of these models locally with minimum user effort:

https://ollama.com/

If you really do have a 2080ti with 128gb of VRAM, we'd love to hear more about how you did it!

CuriousSkeptic•4h ago
Im sure this guy has some helpful hints on that: https://youtube.com/@azisk
exe34•4h ago
llama.cpp + quantized: https://huggingface.co/bartowski/Alibaba-NLP_Tongyi-DeepRese...

get the biggest one that will fit in your vram.

davidsainez•1h ago
This is the way. I managed to run (super) tiny models on CPU only with this approach.
btbuildem•3h ago
I've recently put together a setup that seemed reasonable for my limited budget. Mind you, most of the components were second-hand, open box deals, or deep discount of the moment.

This comfortably fits FP8 quantized 30B models that seem to be "top of the line for hobbyists" grade across the board.

- Ryzen 9 9950X

- MSI MPG X670E Carbon

- 96GB RAM

- 2x RTX 3090 (24GB VRAM each)

- 1600W PSU

pstuart•2h ago
That's basically what I imagined would be my rig if I were to pull the trigger. Do you have an NVLink adapter as well?
btbuildem•1h ago
No NVLink; it took me a long time to compose the exact hardware specs, because I wanted to optimize performance. Both cards are on x8 PCIe direct CPU channels, close to their max throughput anyway. It runs hot with the CPU engaged, but it runs fast.
nine_k•1h ago
Does it offer more performance than a Macbook Pro that could be had for a comparable sum? Your build can be had for under $3k; a used MBP M3 with 64 GB RAM can be had for approximately $3.5k.
btbuildem•1h ago
I'm not sure, I did not run any benchmarks. As a ballpark figure -- with both cards throttled down to 250W, running a Qwen-30B FP8 model (variant depending on task), I get upwards of 60 tok/sec. It feels on par with the premium models, tbh.

Of course this is in a single-user environment, with vLLM keeping the model warm.

jlokier•3h ago
I use a Macbook Pro with 128GB RAM "unified memory" that's available to both CPU and GPU.

It's slower than a rented Nvidia GPU, but usable for all the models I've tried (even gpt-oss-120b), and works well in a coffee shop on battery and with no internet connection.

I use Ollama to run the models, so can't run the latest until they are ported to the Ollama library. But I don't have much time for tinkering anyway, so I don't mind the publishing delay.

MaxMatti•1h ago
How's the battery holding up during vibe coding sessions or occasional LLM usage? I've been thinking about getting a MacBook or a laptop with a similar Ryzen chip specifically for that reason.
anon373839•26m ago
I’d strongly advise ditching Ollama for LM Studio, and using MLX versions of the models. They run quite a bit faster on Apple Silicon. Also, LM Studio is much more polished and feature rich than Ollama.
jwr•2h ago
I just use my laptop. A modern MacBook Pro will run ~30B models very well. I normally stick to "Max" CPUs (initially for more performance cores, recently also for the GPU power) with 64GB of RAM. My next update will probably be to 128GB of RAM, because 64GB doesn't quite cut it if you want to run large Docker containers and LLMs.
sumo43•2h ago
Try running this using their harness https://huggingface.co/flashresearch/FlashResearch-4B-Thinki...
mehdibl•5h ago
It's a Qwen 3 MoE fine tune...
zurfer•5h ago
It makes me wonder if we'll see an explosion of purpose trained LLMs because we hit diminishing returns on invest with pre training or if it takes a couple of months to fold these advantages back into the frontier models.

Given the size of frontier models I would assume that they can incorporate many specializations and the most lasting thing here is the training environment.

But there is probably already some tradeoff, as GPT 3.5 was awesome at chess and current models don't seem trained extensively on chess anymore.

deepanwadhwa•4h ago
-> GPT 3.5 was awesome at chess I don't agree with this. I did try to play chess with GPT3.5 and it was horrible. Full of hallucinations.
miki123211•3h ago
It was GPT-3 I think.

As far as I remember, it's post-training that kills chess ability for some reason (GPT-3 wasn't post-trained).

alephnerd•4h ago
> if we'll see an explosion of purpose trained LLMs...

Domain specific models have been on the roadmap for most companies for years now for both competitive (why give up your moat to OpenAI or Anthropic) and financial (why finance OpenAI's margins) perspective.

onlyrealcuzzo•1h ago
Isn't the whole point of the MOE architecture exactly this?

That you can individually train and improve smaller segments as necessary

idiotsecant•1h ago
I think it's the exact opposite - you don't specifically train each 'expert' to be a SME at something. Each of the experts is a generalist but becomes better at portions of tasks in a distributed way. There is no 'best baker', but things evolve toward 'best applier of flour', 'best kneader', etc. I think explicitly domain-trained experts are pretty uncommon in modern schemes.
viraptor•17m ago
That's not entirely correct. Most of moe right now are fully balanced, but there is an idea of a domain expert moe where the training benefits fewer switches. https://arxiv.org/abs/2410.07490
ainch•34m ago
Generally you train each expert simultaneously. The benefit of MoEs is that you get cheap inference because you only use the active expert parameters, which constitute a small fraction of the total parameter count. For example Deepseek R1 (which is especially sparse) only uses 1/18th of the total parameters per-query.
rokob•5h ago
This whole series of work is quite cool. The use of `word-break: break-word;` makes this really hard to read though.
soared•5h ago
I actually can’t read it for some reason? My brain just can’t connect the words
don-bright•4h ago
so it appears the entire text has been Translated with non-breaking space unicode x00a0 instead of normal spaces x0020, so the web layout is considering all paragraph text as a super-long single word ('the\00a0quick\00a0\brown\00a0fox' instead of 'the quick brown fox') - the non-breaking space character appears identically to breaking-space when rendered but underlying coding breaks the concept of "break at end of word" because there is no end as 00a0 literally means "non-breaking"). per Copilot spending a half hour explaining this to me, apparently this can be fixed by opening web browser developer view, and copy/pasting this code into the console.

function replaceInTextNodes(node) { if (node.nodeType === Node.TEXT_NODE) { node.nodeValue = node.nodeValue .replace(/\u00A0/g, ' '); } else { node.childNodes.forEach(replaceInTextNodes); } }

replaceInTextNodes(document.body);

dlisboa•18m ago
That’s why typography matters. You can’t read it because a very basic convention has been broken here and that throws everything off.
theflyestpilot•5h ago
I hope the translation for this is actually "Agree" Deep research. Just a dig at "You are absolutely right!" sycophancy.
numpad0•4h ago
TIL the "full" name of Alibaba Qwen is 通義千問(romanized as "Tongyi Qianwen", something along "knows all thousand questions"), of which the first half without the Chinese accent flags is romanized identically to "同意", meaning "same intents" or "agreed".

The Chinese version of the link says "通义 DeepResearch" in the title, so doesn't look like the "agree" to be the case. Completely agreed that it would be hilarious.

1: https://www.alibabacloud.com/en/solutions/generative-ai/qwen...

rahimnathwani•3h ago
For people who don't read Chinese: the two 'yi' characters numpad0 mentioned (义 and 義) are the same, but written in different variants of Chinese script (Simplified/Traditional).
Traubenfuchs•4h ago
It still feels to me like OpenAI has zero moat. There are like 5 paid competitors + open source models.

I switch between gemini and ChatGpt whenever I feel one fails to fully grasp what I want, I do coding in claude.

How are they supposed to become the 1 trillion dollar company they want to be, with strong competition and open source disruptions every few months?

rokob•3h ago
I don’t know if they can pull it off but a lot of companies are built on strong enterprise sales being able to sell free stuff with a bow on it to someone who doesn’t know better or doesn’t care.
isoprophlex•3h ago
Premium grade deals with Oracle. They will bullshit their way into government and enterprise environments where all the key decision makers are clueless and/or easily manipulated.
nickpinkston•3h ago
Yea, I agree.

Arguably LLMs are both (1) far easier to switch between models than it is today to switch from AWS / GCP / Azure systems, and (2) will be rapidly decreasing switching costs for your legacy systems to port to new ones - ie Oracle's, etc. whole business model.

Meanwhile, the whole world is building more chip fabs, data centers, AI software/hardware architectures, etc.

Feels more like we're headed to commodification of the compute layer more than a few giant AI monopolies.

And if true, that's actually even more exciting for our industry and "letting 100 flowers bloom".

whiplash451•1h ago
Isn’t the moat in the product/UI/UX? I use Claude daily and love the “scratch notebook” feel of it. The barebone model does not get you any of this.
hamandcheese•39m ago
I agree that the scaffolding around the model contributes greatly to the experience. But it doesn't take billions of dollars in GPUs to do that part.
steveny3456•3h ago
Juju
krystofee•2h ago
Isnt it huge deal, that this 30B model can compare and surpass huge closed models?
tbruckner•2h ago
Has anyone found these deep research tools useful? In my experience, they generate really bland reports don't go much further than summarization of what a search engine would return.
ainch•28m ago
The reports are definitely bland, but I find them very helpful for discovering sources. For example, if I'm trying to ask an academic question like "has X been done before," sending something to scour the internet and find me examples to dig into is really helpful - especially since LLMs have some base knowledge which can help with finding the right search terms. It's not doing all the thinking, but those kind of broad overviews are quite helpful, especially since they can just run in the background.
andy99•3m ago
My experience is the same as yours. It feels to me (similar to most LLM writing) like they write for someone who’s not going to read it or use it but is going to glance at it and judge the quality that way and assume it’s good.

Not to different from a lot of consulting reports, in fact, and pretty much of no value if if you’re actually trying to learn something.

DataDaemon•2h ago
Unfortunately soon China will take lead in AI.
aeve890•2h ago
Unfortunately? May I ask why? What country would you like to be the lead in AI?
ninetyninenine•1h ago
The USA of course. Isn't it obvious? What other country is more Free and great? None. Why does this even need to be asked?

China is full of people who want communism to dominate the world with totalitarian control so no one wants China to dominate anything at all because they are bad...

Krasnol•5m ago
The USA is being led by a criminal pedo atm. There is military in the streets and SA-like, masked thugs are kidnapping people. Billionaires sit behind the wheels to profit from all those developments. Many of them are somehow related to AI. You can image what that will be/is used (see Palantir).

The whole country is going down the drain right now. There is nothing about it, sane people outside the Republican bubble would consider "freedom".

victorbjorklund•2m ago
USA is threatening to invade Europe so not sure it can be considered great.
davidsainez•1h ago
I have been very impressed with the Qwen3 series. I'm still evaluating them, and I generally take LLM benchmarks with a huge grain of salt, but their MoE models in particular seem to offer a lot of bang for the compute. But what makes you so sure they will take the lead?
ninetyninenine•1h ago
Isn't this an indication they are already in the lead? They currently have the best model that beats everyone on all quantitative metrics? Are you implying that the US has a better model somewhere?
sumo43•2h ago
I made a 4B Qwen3 distill of this model (and a synthetic dataset created with it) a while back. Both can be found here: https://huggingface.co/flashresearch
brutus1213•2h ago
I recently got a 5090 with 64 GB of RAM (intel cpu). Was just looking for a strong model I can host locally. If I had performance of GPT4-o, I'd be content. Are there any suggestions or cases where people got disappointed?
p1esk•2h ago
5090 has 32GB of RAM. Not sure if that’s enough to fit this model.
svnt•2h ago
It should fit enough of the layers to make it reasonably performant.
IceWreck•1h ago
LlamaCPP supports offloading some experts in a MoE model to CPU. The results are very good and even weaker GPUs can run larger models at reasonable speeds.

n-cpu-moe in https://github.com/ggml-org/llama.cpp/blob/master/tools/serv...

bogtog•2h ago
GPT-OSS-20B at 4- or 8-bits is probably your best bet? Qwen3-30b-a3b probably the next best option. Maybe there exists some 1.7 or 2 bit version of GPT-OSS-120B
yalogin•2h ago
In my experience using these supposed expert models, they are all more or less the same given they all are trained on the same internet data. The differentiation and value is in the context window management and how relevant info from your session is pulled in. So it’s the interface to the model that makes all the difference. Even there the differences are quite minimal. That is because all these companies want to toe the line between providing functionality to keep the users engaged and pushing them to sign up for the subscription.

All this to ask the question, if I host these open source models locally, how is the user interface layer that remembers and picks the right data from my previous session and the agentic automation and others implemented? Do I have to do it myself or are the free options for that?

viksit•1h ago
this is a great question. what are the main use cases that you have for this? i’ve been working on a library for something similar and exposing it via an mcp interface. would love to pick your brain on this (@viksit on twitter)
ninetyninenine•1h ago
Is China dominating the US in terms of AI? Given that they currently have a model that beats the best models at all formal quantitative benchmarks?

What is the state of AI in China? My personal feeling is that it doesn't dominate the zeitgeist in China as it does in the US and despite this because of the massive amount of intellectual capital they have just a small portion of their software engineering talent working on this is enough to go head to head with us even though it only takes a fraction of their attention.

idiotsecant•53m ago
I think the lesson of the Chinese catchup in AI is that there is a massive disadvantage in being first, in this domain. You can do all the hard work and your competitors can distill that work out of your model for pennies on the dollar. Why should anyone want to do the work?
whiplash451•1h ago
Has anyone tried running this on a 5090 or 6000 pro? What throughput do you see?

Lisp: Notes on its Past and Future (1980)

https://www-formal.stanford.edu/jmc/lisp20th/lisp20th.html
37•birdculture•1h ago•19 comments

'This is the big one' – tech firms bet on electrifying rail

https://www.bbc.com/news/articles/czdjg92y00no
24•mikhael•49m ago•4 comments

Using FreeBSD to make self-hosting fun again

https://jsteuernagel.de/posts/using-freebsd-to-make-self-hosting-fun-again/
74•todsacerdoti•9h ago•9 comments

Reproducing the AWS Outage Race Condition with a Model Checker

https://wyounas.github.io/aws/concurrency/2025/10/30/reproducing-the-aws-outage-race-condition-wi...
42•simplegeek•2h ago•2 comments

Linux gamers on Steam cross over the 3% mark

https://www.gamingonlinux.com/2025/11/linux-gamers-on-steam-finally-cross-over-the-3-mark/
193•haunter•1h ago•100 comments

Why don't you use dependent types?

https://lawrencecpaulson.github.io//2025/11/02/Why-not-dependent.html
131•baruchel•5h ago•39 comments

Tongyi DeepResearch – open-source 30B MoE Model that rivals OpenAI DeepResearch

https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research/
187•meander_water•9h ago•67 comments

Anti-cybercrime laws are being weaponized to repress journalism

https://www.cjr.org/analysis/nigeria-pakistan-jordan-cybercrime-laws-journalism.php
126•giuliomagnifico•2h ago•32 comments

Is Your Bluetooth Chip Leaking Secrets via RF Signals?

https://www.semanticscholar.org/paper/Is-Your-Bluetooth-Chip-Leaking-Secrets-via-RF-Ji-Dubrova/c1...
28•transpute•2h ago•4 comments

Printed circuit board substrates derived from lignocellulose nanofibrils

https://www.nature.com/articles/s41598-025-91653-1
15•PaulHoule•6d ago•5 comments

URLs are state containers

https://alfy.blog/2025/10/31/your-url-is-your-state.html
267•thm•9h ago•125 comments

X.org Security Advisory: multiple security issues X.Org X server and Xwayland

https://lists.x.org/archives/xorg-announce/2025-October/003635.html
94•birdculture•7h ago•45 comments

Solar-powered QR reading postboxes being rolled out across UK

https://www.bbc.co.uk/news/articles/cgln72rgrero
5•thinkingemote•4d ago•2 comments

Autodesk's John Walker Explained HP and IBM in 1991 (2015)

https://www.cringely.com/2015/06/03/autodesks-john-walker-explained-hp-and-ibm-in-1991/
90•suioir•4d ago•52 comments

Notes by djb on using Fil-C

https://cr.yp.to/2025/fil-c.html
252•transpute•15h ago•142 comments

Writing FreeDOS Programs in C

https://www.freedos.org/books/cprogramming/
64•AlexeyBrin•7h ago•23 comments

At the end you use Git bisect

https://kevin3010.github.io/git/2025/11/02/At-the-end-you-use-git-bisect.html
111•_spaceatom•3h ago•98 comments

Backpropagation is a leaky abstraction (2016)

https://karpathy.medium.com/yes-you-should-understand-backprop-e2f06eab496b
266•swatson741•15h ago•115 comments

Mock – An API creation and testing utility: Examples

https://dhuan.github.io/mock/latest/examples.html
102•dhuan_•9h ago•17 comments

Rats filmed snatching bats from air

https://www.science.org/content/article/rats-filmed-snatching-bats-air-first-time
92•XzetaU8•5d ago•50 comments

New South Korean national law will turn large parking lots into solar farms

https://electrek.co/2025/11/02/new-national-law-will-turn-large-parking-lots-into-solar-power-farms/
118•thelastgallon•5h ago•100 comments

MTurk is 20 years old today – what did you create with it?

11•csmoak•50m ago•2 comments

Visopsys: OS maintained by a single developer since 1997

https://visopsys.org/
438•kome•22h ago•114 comments

Go Primitive in Java, or Go in a Box

https://donraab.medium.com/go-primitive-in-java-or-go-in-a-box-c26f5c6d7574
61•ingve•1w ago•29 comments

OpenBSD 7.8 Highlights

https://rsadowski.de/posts/2025/openbsd-78/
52•zdw•1w ago•6 comments

Claude Code can debug low-level cryptography

https://words.filippo.io/claude-debugging/
422•Bogdanp•1d ago•194 comments

Welcome to hell; please drive carefully

https://2earth.github.io/website/20251026.html
74•2earth•5d ago•24 comments

React-Native-Godot

https://github.com/borndotcom/react-native-godot
8•Noghartt•2h ago•1 comments

When O3 is 2x slower than O2

https://cat-solstice.github.io/test-pqueue/
89•keyle•4d ago•83 comments

Updated practice for review articles and position papers in ArXiv CS category

https://blog.arxiv.org/2025/10/31/attention-authors-updated-practice-for-review-articles-and-posi...
481•dw64•1d ago•228 comments