Observatory of Examples of How Open Data and Generative AI Intersect

A growing observatory of examples of how open data from official sources and generative artificial intelligence (AI) are intersecting across domains and geographies.

Share your project for inclusion. We seek to learn from generative AI initiatives that use open government and research data across a Spectrum of Scenarios. More information on each scenario can be found in our report: A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI.

ALIA-40b

ALIA-40b is a generative AI model developed by the Barcelona Supercomputing Center (BSC). It was trained on official open data sources, including the Norwegian Colossal Corpus, the Estonian National Corpus, the Danish Parliament Corpus, and additional multilingual open datasets. As a generative AI model, it uses large language model (LLM) technology to perform tasks such as content generation, text summarization, conversational interactions, and translation in multiple languages.

Region

emea

Sector

academia

Scenario

pre-training

Start Date

2025

Location: Spain