frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Smollm3: Smol, multilingual, long-context reasoner LLM

https://huggingface.co/blog/smollm3
100•kashifr•2h ago

Comments

gardnr•1h ago
It's small (3B) and does great on benchmarks. This is a model for edge / mobile deployments so the gains over gemma3-4b are meaningful. It has dual mode reasoning / non_reasoning AND they released the full training method:

> We're releasing SmolLM3 with our engineering blueprint. It includes architecture details, exact data mixtures showing how we progressively boost performance across domains in a three-stage pretraining approach, and the methodology for building a hybrid reasoning model. Usually, achieving these results would require months of reverse engineering. Instead, we're providing the full methodology.

tiahura•1h ago
Can anyone estimate how much of the 3B is necessitated by multi-language support?
rockinghigh•1h ago
The vocabulary size is fairly small (128,256) for a multilingual model. I would guess it doesn't require many additional parameters to support these 5 languages as many tokens can be shared.
nateb2022•1h ago
https://web.archive.org/web/20250708164705/https://huggingfa...
_1•1h ago
Which small model is good for fine tuning to various enterprise data sets? Our business units are wanting to run small models in browser and on mobile devices, without dealing with RAG and cloud resources.
mhitza•1h ago
You really need to try them all out yourself and make sure you have proper benchmarks.

While machine learning is not my field, I've tried to finetune Mistral 7B (following their official guide and toolset) and the results did not satisfy. Had a few very specific questions from the dataset that no matter how much I've finetuned and tweaked the process it was not able to respond with correct information.

A mix of vector search + keyword search is still better at building the right question context than expecting it to learn all the information.

I've used the pretrained dataset approach. Maybe building syntethic questions and answers around the dataset yields better results but I didn't have time to experiment with that approach.

gardnr•54m ago
Small models are bad at knowing things. Trying to train knowledge in to small models is probably not the way you want to go. You could try building an offline embedded RAG system that is deployable as wasm. Some folks have been experiencing success with this.
_1•41m ago
We do use WebLLM and a hosted Weaviate database, but there are complaints about speed (both retrieval and time to first token as the context will get big). The Gemma 3n "nesting doll" approach sounds like it could be useful .. but haven't found anyone specifically doing it to add domain specific knowledge.
simonw•29m ago
What are you hoping to achieve by fine-tuning a model in this way?
WhitneyLand•1h ago
Mostly SOTA performance at the 3B level. A notable addition to the small but truly open club of models that provide full disclosure, code, recipes to reproduce their work.

Looks like ballpark a million dollars of GPU time if you want to train up one for yourself (4000 gpus/24 days).

Very nice write up that’s generous in sharing their learnings.

This is a solid and positive contribution.

YetAnotherNick•6m ago
It's 384 H100s for 24 days, costing less than half a million dollars.
bitwize•1h ago
There's a British comedy skit lurking in here.

"So it's a small large language model?"

"Oh yes, very small."

"How can it be small and large at the same time?"

"Well, it's small by the standards of a large language model."

"So it's large."

"Oh yes, very large."

"Large compared to what?"

"Small language models."

"And so something like ChatGPT, what would that be exactly? A large large language model?"

"Yes, precisely. An LLLM."

msgodel•47m ago
Wow. Close to a Qwen3 distill with 75% the size. That's great!

I've been using the smollm base models for my own finetunes just because they're so high quality, it looks like I might be using them to drive local agents/code completion in the near future too.

Their RL algorithm looks interesting. I'm still using OpenAI's algorithm for my stuff, I've been meaning to check on the SoTA since I know my code is pretty outdated (It's crazy how fast that happens with this stuff.)

gdiamos•47m ago
Nice work anton et al.

I hope you continue the 50-100M parameter models.

I think there is a case for models that finish fast on CPUs in solve by llm test cases.

eachro•19m ago
From what I've heard, the llama3 models are fairly easy to fine-tune (please correct me if I'm wrong or if there are more amenable models here). How easy is it to finetune smollm3? I know a lot of the MoE LLMs have been quite fickle in this regard.
BarakWidawsky•12m ago
It’s interesting that it looks like they didn’t apply their own RL to the model, and instead fine tuned on reasoning traces from large datasets and generating reasoning traces from larger models

How to use AI in video games without losing your soul

https://atelico.studio/blog/game-ai-art-without-losing-your-soul
1•maiybe•18s ago•1 comments

Manage email subscriptions from a single location in Gmail

https://workspaceupdates.googleblog.com/2025/07/manage-email-subscriptions-in-gmail.html
1•pentagrama•1m ago•0 comments

Unless users take action, Android will let Gemini access third-party apps

https://arstechnica.com/security/2025/07/unless-users-take-action-android-will-let-gemini-access-third-party-apps/
2•kmod•2m ago•0 comments

Devstral Finetuned for SIMD Porting

https://simd.info/blog/simdai_a_specialist_llm_for_simd_porting/
1•momojo•3m ago•0 comments

Brainwash '72 [video]

https://archive.org/details/Brainwash72
1•petethomas•3m ago•0 comments

Replit Collaborates with Microsoft to Bring Vibe Coding to Enterprise Customers

https://replit.com/news/microsoft-partnership
1•virtuosarmo•3m ago•1 comments

The Small Data Showdown '25: Is It Time to Ditch Spark Yet?

https://milescole.dev/data-engineering/2025/06/30/Spark-v-DuckDb-v-Polars-v-Daft-Revisited.html
1•RobinL•3m ago•0 comments

The Cost of Technical Debt

https://medium.com/@bernardgranstrom/the-hidden-cost-of-technical-debt-lessons-from-mariadb-columnstores-struggle-in-the-analytics-f3ee1b5c3080
1•jgale•5m ago•0 comments

Georgia court vacates order citing AI-invented caselaw

https://www.theregister.com/2025/07/08/georgia_appeals_court_ai_caselaw/
1•speckx•6m ago•0 comments

Why Mushrooms Are Starting to Replace Everything [video]

https://www.youtube.com/watch?v=jI2LC3WTryw
1•simonebrunozzi•6m ago•0 comments

TapTrap: Animation-Driven Tapjacking on Android

https://taptrap.click/
1•throawayonthe•6m ago•0 comments

Semiconductor industry could short out as copper runs dry

https://www.theregister.com/2025/07/08/copper_supplies_climate_change/
1•rntn•7m ago•0 comments

The Timeless Way of Learning

https://secondvoice.substack.com/p/the-timeless-way-of-learning
2•bobbyjgeorge•7m ago•1 comments

"We Accept of Course That It Is Draconian: and Deliberately So"

https://www.craigmurray.org.uk/archives/2025/07/we-accept-of-course-that-it-is-draconian-and-deliberately-so/
1•k1m•9m ago•0 comments

Embrace your ignorance – How to get the most out of customer interviews

https://russellpollari.substack.com/p/embrace-your-ignorance
1•russ_poll•10m ago•0 comments

CZI Announces New Center for Pediatric CRISPR Cures

https://chanzuckerberg.com/newsroom/center-pediatric-crispr-cures-launch/
1•krgkg•12m ago•0 comments

LLM Hallucination Detection Leaderboard

https://huggingface.co/spaces/kluster-ai/LLM-Hallucination-Detection-Leaderboard
1•rymc•13m ago•0 comments

Floor Traders Want Their Seats Back

https://www.bloomberg.com/opinion/newsletters/2025-07-08/floor-traders-want-their-seats-back
1•ioblomov•17m ago•1 comments

Tech Lead Manager: it's a trap (and I'm still in it)

https://grahamgilbert.com/blog/2025/07/07/tlm-its-a-trap-and-im-still-in-it/
1•dipierro•19m ago•0 comments

Peter Jackson Tries to Resurrect a Giant Bird That Went Extinct 600 Years Ago

https://www.ign.com/articles/its-more-jurassic-park-than-lord-of-the-rings-but-peter-jackson-is-trying-to-resurrect-a-giant-bird-that-went-extinct-600-years-ago-the-celebrated-director-tells-us-why
2•HelloUsername•20m ago•0 comments

IBM Power11 Raises the Bar for Enterprise IT

https://newsroom.ibm.com/2025-07-08-ibm-power11-raises-the-bar-for-enterprise-it
1•ksec•20m ago•0 comments

Peter Jackson backs long shot de-extinction plan starring New Zealand's lost moa

https://apnews.com/article/peter-jackson-moa-de-extinction-colossal-biosciences-04260e26cbe04e787640c9502df94dda
4•petethomas•22m ago•0 comments

Synthetic Chromatophores for Color and Pattern Morphing Skins

https://advanced.onlinelibrary.wiley.com/doi/10.1002/adma.202505104
1•PaulHoule•24m ago•0 comments

Cluely filed a DMCA takedown for tweet about their system prompt

https://twitter.com/jackhcable/status/1942636823525679182
6•taytus•24m ago•0 comments

Words Don't Compile

https://blog.surkar.in/words-dont-compile
1•manthan1674•25m ago•0 comments

Facial recognition cameras could be introduced to tackle fare dodging on Tube

https://www.standard.co.uk/news/transport/facial-recognition-cameras-fare-dodging-tube-london-underground-tfl-b1237049.html
2•pseudolus•25m ago•0 comments

Dynamical origin of Theia, the last giant impactor on Earth

https://arxiv.org/abs/2507.01826
6•bikenaga•26m ago•0 comments

Judge rules that VMware must support crucial Dutch government agency migration

https://www.theregister.com/2025/06/30/dutch_agency_wins_right_to/
1•Logans_Run•27m ago•0 comments

Skia Graphite: Chrome's rasterization back end for the future

https://blog.chromium.org/2025/07/introducing-skia-graphite-chromes.html
3•ingve•28m ago•0 comments

Google's Moonshot Project Gears Up for Human Trail of AI-Designed Drugs

https://in.mashable.com/science/96798/googles-secret-moonshot-project-gears-up-for-human-trail-of-ai-designed-drugs
1•Bluestein•28m ago•0 comments