Observatory of Examples of How Open Data and Generative AI Intersect

A growing observatory of examples of how open data from official sources and generative artificial intelligence (AI) are intersecting across domains and geographies.

Share your project for inclusion. We seek to learn from generative AI initiatives that use open government and research data across a Spectrum of Scenarios. More information on each scenario can be found in our report: A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI.

DataGemma

DataGemma is an initiative by Google and Data Commons which seeks to improve the quality of the AI output using statistical data. The team augments the Gemma model using RIG (Retrieval-Interleaved Generation) and RAG (Retrieval-Augmented Generation) using data from its Data Commons initiative and makes the model open access. Through these processes, the team aims to create LLMs for researchers and developers to use.

Region

international

Sector

private_sector

Scenario

pre-training

Start Date

2024

Location: Global