GPT-NL-bouwen_aan_een_soeverein_en_eerlijk_taalmodel_voor_Nederland.pdf

Data Expo 11-9-2025 Saskia Lensink

There is growing interest in European alternatives to big tech’s
products 3

• Sovereign, lawful, aligned with European values • A Dutch-English
Large Language Model • Built from scratch on data that we are allowed to use Purpose-built for compliance Trusted for high-stakes environments A foundation for innovation 4 What is GPT-NL?

Innovativeness at our core Forward-thinking development of sovereign GenAI technology
Creative problem-solving in legally and ethically complex domains Leadership in European AI through novel approaches to transparency and trust Trailblazer in responsible AI sovereign trustworthy reciprocal transparant

GPT-NL Fine-tuned model Pre-trained model Infrastructure Data GPT-NL trained on
>1T tokens of Dutch, English and code, all lawfully acquired GPT-NL aims for at least Llama2 7B or GPT-3 175B levels of performance. GPT-NL will be fine-tuned to perform summarization, simplification, and overall text generation Trained on the National Supercomputer Snellius

7 Building the model, not the application • We are
developing the Large Language Model (LLM) itself. • Gettting the best performance in RAG, summarization, simplification • We focus on maximizing the model's capabilities, safety, and generalization, so you can focus on your own specialized applications • Possible to run fully on-prem, with full control over your own data, environment, and choices VS

8 Current phase

Data Guidelines CC-BY-SA CC-0 CC-BY(-4.0)* Public Domain Agreement Yes No
No License CC-NC, CC-ND GPL-2.0, GPL-3.0 robot.txt opt-out LLM-distilled GPT-NL position MIT* Apache 2.0 Newspapers OS Code Publications Governmental Yes No Forums* Social Media* Non-professional (medical) advice Travel blogs Archive data Wikipedia Newspapers Edu resources Papers Rechtspraak Common- Crawl* *without thourough curation copyright type

• Data is the sole ingredient of all large language
models. Yet everybody takes for granted that can be used free of charge, without consent of those who created it. • We believe data artists, journalists and other creators should be paid for their work • We cannot afford to license those datasets in advance, which would require millions of euros. Instead, we pay 50% of the revenue from the commercial license to the owners of the data. • The other 50% will solely be used for continuation of GPT-NL – not for profit. • Data providers can choose to receive their compensation as a kickback or use it to gain (commercial) access to the model. 11 Towards a fair data value chain

13 Current phase Q4 2025 Launch GPT-NL 2026

14 Antwoorden op de prompt "Een aardappel is..." Na het
zien van ongeveer 350 miljard tokens Na het zien van ongeveer 210 miljard tokens Na het zien van ongeveer 3 miljard tokens “…een plant, die in de grond wortelt en waarvan het loof dient als voedsel voor dieren. De aardappel is dus een gewas. Het telen van gewassen is een landbouw-activiteit. In de Noordoostpolder zijn twee…” “…een waardige vervanger van de biet, en het is een goede keus om de teelt van aardappelen te combineren met andere teelten. De aardappel kan als basisgewas geteeld worden op alle grondsoorten, maar heeft een voorkeur voor…” “…een belangrijk onderdeel van de voedselketen. Het gaat om het behoud van biodiversiteit en het vergroten van de weerbaarheid tegen klimaatverandering. De eiwittransitie moet leiden tot minder gebruik van gewasbeschermingsmiddelen, meer plantaardige alternatieven voor dierlijke mest…”

Soeverein communiceren, soeverein vliegen… 17 … en nu op weg
naar soevereine AI!

18 Want to know more? Check our website www.gpt-nl.nl or
follow us on LinkedIn! [email protected]

GPT-NL-bouwen_aan_een_soeverein_en_eerlijk_taal...

GPT-NL-bouwen_aan_een_soeverein_en_eerlijk_taalmodel_voor_Nederland.pdf

Marketing OGZ PRO

More Decks by Marketing OGZ

Featured

Transcript

Data Expo 11-9-2025 Saskia Lensink

2

There is growing interest in European alternatives to big tech’s

• Sovereign, lawful, aligned with European values • A Dutch-English

Innovativeness at our core Forward-thinking development of sovereign GenAI technology

GPT-NL Fine-tuned model Pre-trained model Infrastructure Data GPT-NL trained on

7 Building the model, not the application • We are

8 Current phase

Data Guidelines CC-BY-SA CC-0 CC-BY(-4.0)* Public Domain Agreement Yes No

• Data is the sole ingredient of all large language

12

13 Current phase Q4 2025 Launch GPT-NL 2026

14 Antwoorden op de prompt "Een aardappel is..." Na het

15

Soeverein communiceren, soeverein vliegen… 17 … en nu op weg

18 Want to know more? Check our website www.gpt-nl.nl or