OLMo 2 is an open-source generative AI language model developed by the Allen Institute for AI (AI2). It was trained primarily on data sources such as Wikipedia and Wikibooks via Dolma 1.7, academic papers from arXiv, and additional datasets like OpenWebMath and Algebraic Stack from ProofPile II. These sources that are available to the general public and researchers worldwide include web pages, code repositories, and academic content. OLMo 2 aims to support tasks like instruction-following, conversational AI, and text generation.
A growing observatory of examples of how open data from official sources and generative artificial intelligence (AI) are intersecting across domains and geographies.
Share your project for inclusion. We seek to learn from generative AI initiatives that use open government and research data across a Spectrum of Scenarios. More information on each scenario can be found in our report: A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI.