Author is correct that intelligence is compounding. That's why domain-specific models are usually general models converted to domain-specific models by continued pretraining. Even general models, like H20's, have been improved by constraining them to domain-supporting, general knowledge in a second phase of pretraining. But, they're eventually domain specific.
Outside LLM's, I think most models are domain-specific: genetics, stock prices, ECG/EKG scans, transmission shifying, seismic, climate, etc. LLM's trying to do everything are an exception to the rule that most ML is domain-specific.
This looks like an "ethical" LLM but not domain specific. What is the domain here?
> That's why domain-specific models are usually general models converted to domain-specific models by continued pretraining
I've also wondered this, like with the case of the Codex model. My hunch is that a good general model trumps a pretrained model by just adding an appropriate system prompt. Which is why even OpenAI sorta recommends using GPT-5.4 over any Codex model.
>We would have a healthcare model, economics model, mathematics model, coding model and so on.
It's not the question whether there ever will be specialized model, rather it's the matter of when.
This will democratize almost all work and profession, including programmers, architects, lawyers, engineers, medical doctors, etc.
For half-empty glass people, they will say this is a catastrophe of machine replacing human. On the other hand, the half-full glass people will say this is good for society and humanity by making the work more efficient, faster and at a much lower cost.
Imagine instead of having to wait for a few months for your CVD diagnostic procedures due to the lack of cardiologist around the world (facts), the diagnostics with the help of AI/LLM will probably takes only a few days instead with expert cardiologist in-the-loop, provided the sensitivity is high enough.
It's a win-win situation for patients, medical doctors and hospitals. This will lead to early detection of CVDs, hence less complication and suffering whether it's acute or chronic CVDs.
The foundation models are generic by nature with clusters HPC with GPU/TPU inside AI data-center for model training.
The other extreme is RAG with vector databases and file-system for context prompting as the sibling's comments mentioned.
The best trade-off or Goldilocks is the model fine-tuning. To be specific it's the promising self-distillation fine-tuning (SDFT) as recently proposed by MIT and ETH Zurich [1],[2]. Instead of the disadvantages of forgetting nature of the conventional supervised fine-tuning (SFT), thr SDFT is not forgetful that makes fine-tuning practical and not wasteful. The SDFT only used 4 x H200 GPU for fine-tuning process.
Apple is also reporting the same with their simple Smself-distillation (SSD) for LLM coding specialization [3],[4]. They used 8 x B200 GPU for model fine-tuning, which any company can afford for local fine-tuning based on open weight LLM models available from Google, Meta, Nvidia, OpenAI, DeepSeek, etc.
[1] Self-Distillation Enables Continual Learning:
https://arxiv.org/abs/2601.19897
[2] Self-Distillation Enables Continual Learning:
https://self-distillation.github.io/SDFT.html
[3] Embarrassingly simple self-distillation improves code generation:
https://arxiv.org/abs/2604.01193
[4] Embarrassingly simple self-distillation improves code generation (185 comments):
scrpgil•3h ago
I built an MCP server that feeds a user's real schedule, tasks, and goals into Claude/ChatGPT. The model isn't specialized — but the output is, because the context is. No fine-tuning, no domain-specific training. Just structured data at inference time.
Domain-specific LLMs won't exist not because specialization is useless, but because it's cheaper to specialize the input than the model.