Observatory of Examples of How Open Data and Generative AI Intersect

A growing observatory of examples of how open data from official sources and generative artificial intelligence (AI) are intersecting across domains and geographies.

Share your project for inclusion. We seek to learn from generative AI initiatives that use open government and research data across a Spectrum of Scenarios. More information on each scenario can be found in our report: A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI.

Dolma

Dolma is an open dataset created for the Allen Institute of AI made up of academic research along with other data sources such as books, website content, and code. The dataset currently hosts 3 trillion tokens and is accompanied by a toolkit on how to source datasets for training purposes.

Region

international

Sector

civic_technon-profit

Scenario

pre-training

Start Date

2023

Location: Global