Krikri: A New State-of-the-Art for Greek LLMs

The gap between high-resource languages (like English or Chinese) and medium-resource languages has long been a bottleneck for adopting Generative AI in Greece. While multilingual models exist, they often dilute performance in specific languages to accommodate many.

Presented at EMNLP 2025, Llama-Krikri-8B represents a paradigm shift. Developed by the Institute for Language and Speech Processing, Athena RC, this model solves the “Greek problem” not just through more data, but through advanced post-training pipelines.

Note: Layer42Labs was founded by the core research team behind this paper to provide commercial integration and support for these models.

The Tokenizer Breakthrough

One of the most critical yet overlooked aspects of LLM efficiency is tokenization. Standard models fragment Greek words into excessive sub-tokens, increasing inference costs and latency.

The researchers at ILSP extended the Llama 3 tokenizer with 20,992 new Greek-specific tokens. The impact on efficiency is drastic:

Metric	Llama 3.1	Llama-Krikri	Improvement
Fertility (Greek)	2.73	1.65	~40% Lower Cost
Vocabulary Size	128k	~149k	Enhanced Semantics

Data sourced from Table 2 of the paper.

This reduction in fertility means that for the same amount of text, Krikri requires significantly fewer tokens to process, making it faster and cheaper to run in production environments.

Training Methodology: Beyond Just Text

Krikri was not merely “fine-tuned.” It underwent a rigorous continual pretraining phase on a curated corpus of 110 billion tokens.

1. The Data Mix

To prevent “catastrophic forgetting” (losing English or coding abilities), the team used a mixed-curriculum strategy. The training mix comprised:

62.3% High-quality Greek text
23.1% English text (to maintain reasoning)
8.6% Math & Code

2. Synthetic Instruction Tuning (MAGPIE)

For the instruct/chat version, the team utilized MAGPIE, a technique to generate high-quality synthetic instruction-response pairs. This was combined with Direct Preference Optimization (DPO) to align the model with human preferences for helpfulness and safety.

Benchmarking Results

The researchers introduced novel benchmarks for the Greek language, including Medical MCQA, IFEval Greek, MT-Bench Greek, and Arena-Hard-Auto Greek.

Krikri-8B demonstrates superior performance compared to previous state-of-the-art models like Meltemi-7B and even the base Llama 3.1 model.

Base Model Performance: On a Greek evaluation suite developed by the team, Krikri-8B-Base achieved an average accuracy of 59.5%, compared to 48.7% for Llama-3.1 and 47.9% for Meltemi-7B.
Instruction Following: On IFEval Greek, the Instruct model scored 67.5%, massively outperforming Meltemi (32.7%) and even surpassing larger models like Gemma-2-27B-IT.

Comprehensive Language Support

Uniquely, Krikri provides robust support across the linguistic spectrum:

Modern Greek: The primary focus, enabling contemporary business and conversational applications.
English: Maintaining high performance for multilingual reasoning and code generation.
Polytonic & Ancient Greek: Additional support for digital humanities, historical archives, and academic research.

Publication Details

Title: Krikri: Advancing Open Large Language Models for Greek
Authors: Dimitris Roussis, Leon Voukoutis, Georgios Paraskevopoulos, Sokratis Sofianopoulos, Prokopis Prokopidis, Vassilis Papavassileiou, Athanasios Katsamanis, Stelios Piperidis, Vassilis Katsouros.
Conference: Findings of EMNLP 2025
URL: https://aclanthology.org/2025.findings-emnlp.268.pdf