AMÁLIA and the future of European Portuguese LLMs

In December 2024, the Portuguese government announced AMÁLIA: a 5.5 Million Euro investment on a large-scale LLM for European Portuguese.

The other day, while building an overview of the different Portuguese NLP efforts, I stumbled upon the technical report! I couldn't believe my eyes. Much to talk about! Let's get straight to it!

Actually, before we do. A quick disclaimer: AMÁLIA is an impressive piece of work. And the researchers should be very proud. But when the investment from the state is this significant, the entire country is the recipient of the work - and so I think it's only fair to ask some hard questions. If you participated on the project and are reading this: Thank you for your work!

Alright - now let's get to it.

AMÁLIA in a nutshell

AMÁLIA is "a fully open source Large Language Model (LLM) for European Portuguese". The goal is simple: to create an LLM that treats European Portuguese as a first-class citizen. Italy, for example - did something similar with Minerva. AMÁLIA is a result of a collaboration between several top tier Portuguese Universities and Research Labs (NOVA, IST, IT, and FCT).

Contrary to what I would have expected, AMÁLIA is not trained from scratch. It's a continuation of the pre-training phase of EuroLLM: an earlier effort (with a lot of Portuguese manpower!). To my understanding, the architecture is the same as EuroLLM, with some slight modifications to the context length and RoPE scaling.

Now, how does AMÁLIA focus on Portuguese? One word: Data. Across every different training stage they tried to increase the share of European Portuguese data the model was trained on. During pre-training they used Arquivo.pt data, during supervised fine tuning (SFT) they synthetically generated Portuguese data, and during preference training they sub-sampled some of the data from the SFT phase.

Click to open the full resolution version in a new tab. Reconstructed from the AMÁLIA and ALBA papers.

Training is interesting and all, but even more interesting is to measure if what was trained was any good. Which for this particular case, can be especially challenging. The team created four new benchmarks specific for European Portuguese. The most prominent one of these is ALBA.

... continue reading