frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

AMÁLIA and the future of European Portuguese LLMs

https://duarteocarmo.com/blog/amalia-and-the-future-of-european-portuguese-llms
41•johnbarron•3d ago

Comments

hartator•1h ago
What a waste of time and money.

Trying to force a LLM into a specific language makes you missed out on most of the world knowledge.

mistrial9•1h ago
> makes you missed out on most of the world knowledge

and, who knows what will happen to grammar ?

embedding-shape•1h ago
What LLM isn't forced into a specific language? That'd be a weird language model no one could understand, you need to chose at least one language, ideally the same as the creators speak.

Besides, there is knowledge that is locked behind languages, there are things known in Portuguese that aren't known in other languages, and the same for other languages too. More accessibility to those ideas wouldn't hurt.

Miraste•34m ago
To my knowledge, all major LLMs are multilingual. This article could really have used an evaluation of existing models' European Portuguese capabilities.
cess11•32m ago
E.g. gemma3:4b can fake simple conversations in several european languages, including portuguese, swedish and finnish.

It's just a database. If you push text in one language into it, it'll likely crap out stuff in that same language, unless the system prompt that also goes in with your query causes it not to.

KK7NIL•13m ago
This is how Europe thinks they can catch up on tech, by having the government fund vanity projects which will be made obsolete by more general techniques in 6 months.
swiftcoder•44m ago
It is definitely an interesting problem, because Portugal is a small enough country that the actual total corpus of available texts in (non-Brazilian) Portuguese is potentially problematic.
embedding-shape•32m ago
I don't think so, Portugal the country might be small, with a small population, but there is ~250 million "Lusophones" (native Portuguese speakers), making it the fifth-most spoken native language in the world, I'd hardly call that small :) And before everyone screams; yes, European Portuguese is different from Brazilian Portuguese, but they're still both Portuguese and understand each other, so it's not like the text from one cannot be used to train a model for the other, or vice-versa.

All in all, I don't think that's a major issue here.

KK7NIL•23m ago
The whole point of this project is to have an LLM that speaks European Portuguese, not Brazilian Portuguese.
embedding-shape•21m ago
Right, and my point is that if you use 80% Brazilian Portuguese during base model training + 20% European Portuguese as post-training, you pretty much get exactly that, except with a ton more of available training data.
KK7NIL•15m ago
What's your evidence for that?

And if the first 80% doesn't bias the language after post-training (which I think is what you're claiming) why not go for English or a mixture of languages, which is essentially what they did by starting with EuroLLM?

swiftcoder•21m ago
The authors are pretty clearly trying to draw only from European Portuguese sources - I feel like there's a fairly widespread attitude here that the language is being overwhelmed by the sheer number of Brazilian speakers (which there is obviously at least some truth to).

I don't necessarily personally feel like preserving European Portuguese in amber is a worthwhile goal (anymore than it is productive for Brits to be prickly about the meteoric rise of US English)

madaxe_again•10m ago
Man, there’s an attitude up here in trás-os-montes that the rest of Portugal has spoken unrecognisable trash for a century. It took me years to realise I’d learned hilariously antique Portuguese by moving there.

Then again, if you go to Miranda de Douro, they’ll say the rest of Portugal has been talking nonsense for the last 700 years, so the purists at least always have their concents to retreat to if they so choose.

madaxe_again•21m ago
Mutually intelligible, yes, but far from perfectly so. I speak both, as a native anglophone, and the difference is not so much “US vs British English” so much as “Guyanese English vs British English”. Like, fundamental points of grammar differ, the spoken rhythm and syllabic stress differs (poetry does not translate well between them), never mind just vocabulary. Continental Portuguese people tend to find it easier to understand brasileiros than vice versa, largely due to mostly one-way cultural exports, but to try to roll both into a single model would create a creole at best.
embedding-shape•19m ago
I agree, they're not the same. But they're far closer than other languages who don't come from the same families.
pu_pe•39m ago
I'm not sure the direction should be to finetune a small local model for each country or language. These models are already not particularly great at information retrieval, so I doubt anyone would use them for questions like the author suggests (ie who was the president between X and Y). Similarly, they are a little too lightweight to be used for translations too.

If the budget is indeed so modest (5.5 million euros!), I would focus completely on preparing datasets and making sure all open cultural artifacts that we can find are well documented in them. That way every model, private or open, that gets trained in the future could better represent the culture and language of your country.

algoth1•22m ago
Wouldnt it be easier to fine tune a model to convert the Brazilian Portuguese corpus into European Portuguese and then use that corpus?

Ratty – A terminal emulator with inline 3D graphics

https://ratty-term.org/
427•orhunp_•6h ago•152 comments

Training an LLM in Swift, Part 1: Taking matrix mult from Gflop/s to Tflop/s

https://www.cocoawithlove.com/blog/matrix-multiplications-swift.html
100•zdw•23h ago•7 comments

Nullsoft, 1997-2004 AOL kills off the last maverick tech company (2004)

https://slate.com/technology/2004/11/the-death-of-the-last-maverick-tech-company.html
14•downbad_•3d ago•1 comments

Hardware Attestation as Monopoly Enabler

https://grapheneos.social/@GrapheneOS/116550899908879585
1953•ChuckMcM•22h ago•657 comments

Gmail registration now requires scanning a QR code and sending a text message

https://discuss.privacyguides.net/t/google-account-registration-now-requires-sending-an-sms-via-p...
276•negura•9h ago•144 comments

AMÁLIA and the future of European Portuguese LLMs

https://duarteocarmo.com/blog/amalia-and-the-future-of-european-portuguese-llms
41•johnbarron•3d ago•17 comments

Local AI needs to be the norm

https://unix.foo/posts/local-ai-needs-to-be-norm/
1605•cylo•23h ago•632 comments

Venom and Hot Peppers Offer a Key to Killing Resistant Bacteria

https://www.wired.com/story/mexican-science-transforms-scorpion-venom-and-habanero-chile-into-ant...
101•littlexsparkee•2d ago•25 comments

I'm going back to writing code by hand

https://blog.k10s.dev/im-going-back-to-writing-code-by-hand/
729•dropbox_miner•15h ago•400 comments

Building a web server in aarch64 assembly to give my life (a lack of) meaning

https://imtomt.github.io/ymawky/
38•theanonymousone•3d ago•13 comments

Software engineering may no longer be a lifetime career

https://www.seangoedecke.com/software-engineering-may-no-longer-be-a-lifetime-career/
75•movis•2h ago•118 comments

Scaffold a 1990s Geocities-themed static website

https://pypi.org/project/create-geocities-app/
31•whatsupdog•3h ago•11 comments

Running local models on an M4 with 24GB memory

https://jola.dev/posts/running-local-models-on-m4
473•shintoist•17h ago•138 comments

The greatest shot in television: James Burke had one chance to nail this scene (2024)

https://www.openculture.com/2024/10/the-greatest-shot-in-television.html
282•susam•14h ago•157 comments

Guitar tuner that uses phone accelerometer

https://tautme.github.io/phone-sensors/accel-tuner.html
113•adm4•3d ago•61 comments

Driver accused of DUI tracks missing laptop to Illinois State trooper's house

https://abc7chicago.com/post/top-cop-driver-accused-dui-tracks-missing-laptop-illinois-state-poli...
375•bryan0•2d ago•280 comments

An AI coding agent, used to write code, needs to reduce your maintenance costs

https://www.jamesshore.com/v2/blog/2026/you-need-ai-that-reduces-your-maintenance-costs
287•cratermoon•17h ago•85 comments

Obsidian plugin was abused to deploy a remote access trojan

https://cyber.netsecops.io/articles/obsidian-plugin-abused-in-campaign-to-deploy-phantom-pulse-rat/
307•cmbailey•18h ago•179 comments

Incident Report: CVE-2024-YIKES

https://nesbitt.io/2026/02/03/incident-report-cve-2024-yikes.html
652•miniBill•23h ago•156 comments

Classification of amino acids

https://www.khanacademy.org/test-prep/mcat/chemical-processes/amino-acids-peptides-proteins-5d/v/...
45•kamaraju•2d ago•0 comments

Why we lose our friends as we age

https://www.theatlantic.com/newsletters/archive/2023/02/friendship-aging/673026/
19•paulpauper•37m ago•2 comments

Mythos Finds a Curl Vulnerability

https://daniel.haxx.se/blog/2026/05/11/mythos-finds-a-curl-vulnerability/
465•TangerineDream•10h ago•202 comments

Bliss (Photograph)

https://en.wikipedia.org/wiki/Bliss_(photograph)
76•cainxinth•3d ago•37 comments

A.I. note takers are making lawyers nervous

https://www.nytimes.com/2026/05/09/business/dealbook/ai-notetakers-legal-risk.html
141•JumpCrisscross•6h ago•99 comments

The Adventure Family Tree (2024)

https://mipmip.org/advfamily/advfamily.html
43•exvi•8h ago•3 comments

7 lines of code, 3 minutes: Implement a programming language (2010)

https://matt.might.net/articles/implementing-a-programming-language/
87•azhenley•12h ago•30 comments

Ask HN: What are you working on? (May 2026)

229•david927•23h ago•870 comments

First tunnel element of the Fehmarnbelt Tunnel immersed

https://www.arup.com/en-us/news/first-fehmarnbelt-tunnel-element-lowered/
142•robin_reala•4d ago•82 comments

How Fast Does Claude, Acting as a User Space IP Stack, Respond to Pings?

https://dunkels.com/adam/claude-user-space-ip-stack-ping/
143•adunk•17h ago•51 comments

Guy Goma's Accidental BBC Interview Lives on After 20 Years

https://www.nytimes.com/2026/05/06/business/media/bbc-guy-goma-interview.html
159•nxobject•2d ago•45 comments