I don't know whether Amazon relies on LLMs or SLMs for this and for similar interactions, but it makes tons of financial sense to use SLMs for narrowly scoped agents. In use cases like customer service, the intelligence behind LLMs is all wasted on the task the agents are trained for.
Wouldn't surprise me if down the road we start suggesting role-specific SLMs rather than general LLMs as both an ethics- and security-risk mitigation too.
The LLM told me what sort of information they need, and what is the process, after which I followed through the whole thing.
After I went through the whole thing it reassured me everything is in order, and my request is being processed.
For two weeks, nothing happened, I emailed the (human) support staff, and they responded to me, that they can see no such request in their system, turns out the LLM hallucinated the entire customer flow and was just spewing BS at me.
We're supposed to think "oh it's an LLM, well, that's ok then"? A question we'll be asking more frequently as time goes on, I suspect.
That said, I also think the "Unix" approach to ML is right. We should see more splits, however currently all these tools rely on great language comprehension. Sure, we might be able to train a model on only English and delegate translation to another model, but that will certainly lose (much needed) color. So if all of these agents will need comprehensive language understanding anyway, to be able to communicate with each other, is SLM really better than MoE?
What I'd love to "distill" out of these models is domain knowledge that is stale anyway. It's great that I can ask Claude to implement a React component, but why does the model that can do taxes so-so also try to write a React component so-so? Perhaps what's needed is a search engine to find agents. Now we're into expensive market place subscription territory, but that's probably viable for companies. It'll create a larger us-them chasm, though and the winner takes it all.
A server needs energy to build it, house, power and maintain it. It is optimized for throughoutput and can be used 100% of the time. To use the server, additional energy is needed to send packets through the internet.
A local machine needs energy to build and power it. If it lives inside a person's phone or laptop, one could say housing and maintenance is free. It is optimized to have a nice form factor for personal use. It is used maybe 10% of the time or so. No energy for internet packages is needed when using the local machine.
My initial gut feeling is that the server will have way better energy efficiency when it comes to the amount of calculations it can do over its lifetime and how much energy it needs over its lifetime. But I would love to see the actual math.
eric-burel•2h ago