frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Small language models are the future of agentic AI

https://arxiv.org/abs/2506.02153
48•favoboa•5h ago

Comments

eric-burel•2h ago
Slightly related, on the cooperation between large models and small models (traditional ML) : https://arxiv.org/abs/2409.06857
bryant•2h ago
A few weeks ago, I processed a product refund with Amazon via agent. It was simple, straightforward, and surprisingly obvious that it was backed by a language model based on how it responded to my frustration about it asking tons of questions. But in the end, it processed my refund without ever connecting me with a human being.

I don't know whether Amazon relies on LLMs or SLMs for this and for similar interactions, but it makes tons of financial sense to use SLMs for narrowly scoped agents. In use cases like customer service, the intelligence behind LLMs is all wasted on the task the agents are trained for.

Wouldn't surprise me if down the road we start suggesting role-specific SLMs rather than general LLMs as both an ethics- and security-risk mitigation too.

automatic6131•33m ago
You can (used to?) get a refund on Amazon with normal CRUD app flow. Putting an SLM and a conversational interface over it is a backwards step.
torginus•27m ago
I just had my first experience with a customer service LLM. I needed to get my account details changed, and for that I needed to use the customer support chat.

The LLM told me what sort of information they need, and what is the process, after which I followed through the whole thing.

After I went through the whole thing it reassured me everything is in order, and my request is being processed.

For two weeks, nothing happened, I emailed the (human) support staff, and they responded to me, that they can see no such request in their system, turns out the LLM hallucinated the entire customer flow and was just spewing BS at me.

exe34•24m ago
That's why I take screenshots of anything that I don't get an email confirmation for.
ttctciyf•13m ago
There really should be some comeback for this type of enshAItification.

We're supposed to think "oh it's an LLM, well, that's ok then"? A question we'll be asking more frequently as time goes on, I suspect.

janpmz•2h ago
One could start with a large model for exploration during development, and then distill it down to a small model that covers the variety of the task and fits on a USB drive. E.g. when I use a model for gardening purposes, I could prune knowledge about other topics.
loktarogar•2h ago
Pruning is exactly what you're looking for in a gardening SLM
moqizhengz•30m ago
How is SLM the future of AI while we are not even sure about if LMs are the future of AI?
boxed•19m ago
"Future" maybe means "next two months"? :P
flowerthoughts•17m ago
No mention of mixture-of-exports. Seems related. They do list a DeepSeek R1 distillate as an SLM. The introduction starts with sales pitch. And there's a call-to-action at the end. This seems like marketing with source references sprinkled in.

That said, I also think the "Unix" approach to ML is right. We should see more splits, however currently all these tools rely on great language comprehension. Sure, we might be able to train a model on only English and delegate translation to another model, but that will certainly lose (much needed) color. So if all of these agents will need comprehensive language understanding anyway, to be able to communicate with each other, is SLM really better than MoE?

What I'd love to "distill" out of these models is domain knowledge that is stale anyway. It's great that I can ask Claude to implement a React component, but why does the model that can do taxes so-so also try to write a React component so-so? Perhaps what's needed is a search engine to find agents. Now we're into expensive market place subscription territory, but that's probably viable for companies. It'll create a larger us-them chasm, though and the winner takes it all.

mg•12m ago
I wonder how the math turns out when we compare the energy use of local vs remote models from first principles.

A server needs energy to build it, house, power and maintain it. It is optimized for throughoutput and can be used 100% of the time. To use the server, additional energy is needed to send packets through the internet.

A local machine needs energy to build and power it. If it lives inside a person's phone or laptop, one could say housing and maintenance is free. It is optimized to have a nice form factor for personal use. It is used maybe 10% of the time or so. No energy for internet packages is needed when using the local machine.

My initial gut feeling is that the server will have way better energy efficiency when it comes to the amount of calculations it can do over its lifetime and how much energy it needs over its lifetime. But I would love to see the actual math.

Fermi America, Texas Tech unveil $11B HyperGrid AI campus proposed

https://www.amarillo.com/story/news/2025/06/27/fermi-america-and-texas-tech-university-unveil-plans-for-11-billion-hypergrid-ai-campus/84377449007/
1•mpweiher•15m ago•0 comments

Trump suggests DOGE look at subsidies for Musk's companies to save money

https://www.reuters.com/business/autos-transportation/elon-musk-renews-criticism-trump-spending-bill-calls-new-political-party-2025-06-30/
2•jmsflknr•16m ago•0 comments

Deutsche Bahn train hits 405 km/h without falling to bits

https://www.theregister.com/2025/06/30/deutsche_bahn_test/
1•nixass•16m ago•0 comments

Wikipedia: WikiProject AI Cleanup/AI Catchphrases

https://en.wikipedia.org/wiki/Wikipedia:WikiProject_AI_Cleanup/AI_catchphrases
1•exegete•19m ago•0 comments

Show HN: I Built a Cluely Alternative

https://interm.ai/
1•devilzhong•20m ago•0 comments

XBOW – CVE-2025-49493: XML External Entity (XXE) Injection in Akamai CloudTest

https://xbow.com/blog/xbow-akamai-cloudtest-xxe/
2•isaacfrond•23m ago•0 comments

Show HN: I Built a Paul Graham AI Advisor for Founders and Hackers

https://paulgraham.resurrect.space
2•vednig•23m ago•0 comments

GodFather malware hijacks banking apps on Android devices

https://www.americanbanker.com/news/godfather-malware-poses-new-threat-to-android-banking-apps
2•Ozarkian•25m ago•0 comments

Left-Handed Creativity Myth Debunked

https://neurosciencenews.com/left-handed-creativity-29375/
1•isaacfrond•31m ago•0 comments

Show HN: Voice assistant software for ESP32 in C

https://github.com/JLW-7/wally-c
1•neon443•32m ago•0 comments

A modern GUI application for Gemini CLI

https://github.com/the-vc101/gemiui
1•ngaut•33m ago•0 comments

Apple reportedly considers letting Anthropic and OpenAI power Siri

https://techcrunch.com/2025/06/30/apple-reportedly-considers-letting-anthropic-and-openai-power-siri/
1•codexy•33m ago•0 comments

Netbird: Connect your devices into a WireGuard overlay network with SSO, MFA and

https://github.com/netbirdio/netbird
1•fanf2•34m ago•0 comments

Researchers Uncover Hidden Ingredients Behind AI Creativity

https://www.quantamagazine.org/researchers-uncover-hidden-ingredients-behind-ai-creativity-20250630/
1•isaacfrond•36m ago•0 comments

Coordinated, Nationwide Action Combats North Korean IT Worker Illicit Scheme

https://www.justice.gov/opa/pr/justice-department-announces-coordinated-nationwide-actions-combat-north-korean-remote
1•defrost•39m ago•1 comments

Portal, for the C64

https://www.jamiefuller.com/portal/
2•rbanffy•40m ago•0 comments

Google announces their first national credential partner for EU age assurance

https://blog.google/around-the-globe/google-europe/we-are-announcing-sparkasse-as-our-first-national-credential-partner-for-eu-age-assurance/
2•cromka•42m ago•0 comments

Google strikes deal to buy fusion power from MIT spinoff Commonwealth

https://www.reuters.com/sustainability/climate-energy/google-strikes-deal-buy-fusion-power-mit-spinoff-commonwealth-2025-06-30/
1•mpweiher•43m ago•0 comments

Ubuntu 25.10 Raises RISC-V Profile Requirements

https://www.omgubuntu.co.uk/2025/06/ubuntu-riscv-rva23-support
3•bundie•44m ago•0 comments

``Free as Air, Free as Water, Free as Knowledge''

http://bactra.org/Sterling/Free_as_the_Air_Free_as_Water_Free_as_Knowledge.html
1•whoopdedo•48m ago•0 comments

Test names should be sentences

https://bitfieldconsulting.com/posts/test-names
2•chautumn•48m ago•0 comments

World #2 in chess Hikaru Nakamura plays against best current bot [video]

https://www.youtube.com/watch?v=m7N4qC1znDc
1•_tk_•49m ago•0 comments

Automatically Rewrite Container Image References in Kubernetes

https://github.com/flemzord/mutating-registry-webhook
2•flemzord•50m ago•1 comments

The CEO who is building the "IKEA of factories"

https://www.businessinsider.com/nanotronics-ceo-interview-semiconductor-fab-factories-local-2025-6
1•01-_-•52m ago•1 comments

Show HN: InvoiceFast – Generate clean invoices in seconds for freelancers

1•skyzouw•55m ago•1 comments

Gmail ads are not just annoying: Google is now facing a record fine in France

https://tuta.com/blog/gmail-ads-annoying
7•01-_-•59m ago•0 comments

Thinking Machines Lab's $2B Seed Round Is Biggest by a Long Shot

https://news.crunchbase.com/venture/biggest-seed-round-ai-thinking-machines-mira-murati/
1•rbanffy•1h ago•1 comments

You can't always retry a 5xx

https://sophiabits.com/blog/you-cant-always-retry-a-5xx
1•furkansahin•1h ago•0 comments

A French region has banned tap water. Is it a warning for the rest of Europe?

https://www.theguardian.com/environment/2025/jul/01/pfas-forever-chemicals-water-contamination-saint-louis-france-aoe
2•akbarnama•1h ago•0 comments

Distributed Sorting at Scale

https://www.systemdesignacademy.com/blog/design-a-system-for-sorting-large-datasets-distributed-sorting-at-scale
3•pankajtanwar•1h ago•0 comments