AgroLLM is a conversational tool developed by researchers from Pittsburg State University, Oxford Brookes University, and other institutions. It is designed to help farmers make better decisions by providing useful information on farming practices. AgroLLM uses Generative AI to answer farmers' questions, offering advice on topics like crop management, climate impact, and pest control. The tool works by searching through a collection of agricultural resources, including agricultural textbooks, research articles, and other open agricultural datasets. AgroLLM generates responses based on these resources in efforts to assist farmers in improving their practices.
A growing observatory of examples of how open data from official sources and generative artificial intelligence (AI) are intersecting across domains and geographies.
Share your project for inclusion. We seek to learn from generative AI initiatives that use open government and research data across a Spectrum of Scenarios. More information on each scenario can be found in our report: A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI.
Region
internationalSector
academiaScenario
inference_and_insight_generationStart Date
2025Location: United States, United Kingdom
ALIA-40b is a generative AI model developed by the Barcelona Supercomputing Center (BSC). It was trained on official open data sources, including the Norwegian Colossal Corpus, the Estonian National Corpus, the Danish Parliament Corpus, and additional multilingual open datasets. As a generative AI model, it uses large language model (LLM) technology to perform tasks such as content generation, text summarization, conversational interactions, and translation in multiple languages.
Region
emeaSector
academiaScenario
pre-trainingStart Date
2025Location: Spain
Inlook.AI is conversational tool designed to help users access and visualize statistical data. It allows users to query statistical datasets using natural language. Inlook.AI uses generative AI to retrieve and generate responses from a wide range of official statistical datasets and supports multilingual user queries. It uses open government data from official sources, including datasets from the Swiss Federal Statistical Office (FSO) and OFS-City Statistics. The tool is intended for both statistical offices and private companies to support the accessibility and analysis of statistical data.
Region
emeaSector
private_sectorScenario
inference_and_insight_generationStart Date
2025Location: Switzerland
This project, developed by researchers from Mohammed Bin Zayed University for Artificial Intelligence (MBZUAI) and Cerebras Systems, focuses on creating a large-scale instruction-following dataset for the Kazakh language. It uses open data from government and cultural sources, including Kazakhstan's e-Government portal (gov.kz) and cultural data from Kazakh Wikidata. The dataset covers key aspects of Kazakhstan’s governmental structure, legal frameworks, and cultural heritage. The project uses generative AI, specifically GPT-4o, to help create instructional data from government and cultural texts. This data is used to help language models better understand and follow instructions in Kazakh. The goal of this project is to improve language models' ability to understand local governance and culture in Kazakhstan.
Region
apacSector
private_sectoracademiaScenario
pre-trainingStart Date
2025Location: United States, United Arab Emirates
LawPal is a generative AI chatbot developed by researchers from Vidyavardhini’s College of Engineering and Technology in India. It is designed to make legal information more accessible by answering users' legal questions and providing insights on various topics such as case law, statutory provisions, and legal principles. LawPal uses generative AI to generate contextually relevant responses based on official legal texts. The tool is designed to assist users in understanding legal information by providing responses based on legal data such as government legal databases, Supreme Court judgments, statutes, and academic legal literature.
Region
apacSector
academiaScenario
inference_and_insight_generationStart Date
2025Location: India
DataGemma is an initiative by Google and Data Commons which seeks to improve the quality of the AI output using statistical data. The team augments the Gemma model using RIG (Retrieval-Interleaved Generation) and RAG (Retrieval-Augmented Generation) using data from its Data Commons initiative and makes the model open access. Through these processes, the team aims to create LLMs for researchers and developers to use.
Region
internationalSector
private_sectorScenario
pre-trainingStart Date
2024Location: Global
Common Corpus is one of the largest public-domain datasets for LLM training coorindated by Pleias (a technology company) in collaboration with HuggingFace, Occiglot, Eleuther, and Nomic AI. The dataset includes public domain books and newspapers in several languages from national libraries and archives along with other sources. It also includes language data in English, French, Dutch, Spanish, German and Italian.
Region
internationalSector
private_sectorcivic_techScenario
pre-trainingStart Date
2024Location: Global
Alva is a generative AI chatbot based on GPT 4o-mini that uses RAG to answer queries about Basel-Stadt. A key feature of the chatbot is its ability to provide attributed responses - citing the respective webpage or information source where the response came from. Currently, the chatbot can draw from publicly available information on the Basel-Stadt website (www.bs.ch.)
Region
emeaSector
public_sectorScenario
adaptationStart Date
2024Location: Switzerland
The AI Hub is a platform developed by the government of South Korea that aims to accelerate AI innovation using open government data in the private sector. The platform houses South Korea's AI infrastructure and open government datasets for AI development and offers several services such as data quality evaluations. To complement these efforts, the government of Seoul is experimenting with creating synthetic data from open government data. One initiative developed using the AI Hub is the TTCare initiative (an AI driven mobile application for pets) which was trained on data from the AI Hub along with other sources.
Region
apacSector
public_sectorScenario
pre-trainingdata_augmentationStart Date
2024Location: South Korea
Bayaan is a conversational tool developed by the Statistics Centre Abu Dhabi that aims to improve access to data from the Statistical Department. The tool uses generative AI to rapidly provide decision makers with data analytics, visualizations, and information that they can use in their decision making processes. The data included focuses on 7 areas and indicators: "Economy, Population, Industry, Social Statistics, Labour Force, Agriculture, and Environment."
Region
emeaSector
public_sectorScenario
open-ended_explorationStart Date
2024Location: United Arab Emirates
The United States Federal Government is developing a generative AI chatbot that allows users to query federal statistical data. The chatbot sources data from government agencies such as the National Center for Science and Engineering Statistics. It uses natural language processing (NLP) to interpret user queries about federal statistical data and provide relevant information from the available data. The project aims to improve public access to statistical data and support evidence-based policymaking and research.
Region
north_americaSector
public_sectorScenario
inference_and_insight_generationStart Date
2024Location: United States
Region
apacSector
academiaScenario
data_augmentationStart Date
2024Location: South Korea
Ask ReliefWeb is a generative AI-powered tool developed by ReliefWeb, a humanitarian information service managed by the United Nations Office for the Coordination of Humanitarian Affairs (OCHA). The tool is powered by Amazon's Bedrock Generative AI service and the Titan Foundational Model. Ask ReliefWeb allows users to interact with ReliefWeb’s repository of humanitarian reports through a chatbot-like interface. It is designed to assist users in retrieving relevant information from specific reports by generating responses to queries, aiming to enable humanitarian workers to access the data they need in real time. Ask ReliefWeb relies on data sourced exclusively from ReliefWeb’s reports and content.
Region
internationalSector
multilateral_sectorScenario
inference_and_insight_generationStart Date
2024Location: Global
AuroraGPT is an AI model developed by Argonne National Laboratory to support scientific research in areas like biology, cancer studies, and climate science. It was trained on scientific papers and computational data using the Aurora supercomputer. The model aims to help researchers analyze information and generate insights more efficiently. The project is supported by Intel and other partners to develop AI tools for scientific use.
Region
internationalSector
public_sectorScenario
pre-trainingStart Date
2024Location: United States
Berufsinfomat is a generative AI-driven tool (relying on ChatGPT) introduced by the Austrian Public Employment Service for career coaching. The system, trained on the Austrian Public Employment service's knowledge database on professions, training, and education is intended to offer users with information on professions, training, and education. The Berufsinfomat received 160,000 prompts in January 2024 and around 20,000 additional monthly inquiries. It received criticism for producing responses that conformed to stereotypes about men and women, bias in responses, and for producing various problematic answers. It has received several revisions in response to these problems.
Region
emeaSector
public_sectorScenario
inference_and_insight_generationStart Date
2024Location: Austria
Bielik 7B v0.1 is a generative AI model developed collaboratively by SpeakLeash and the ACK Cyfronet AGH computing center in Poland. It was trained on publicly available, official open datasets, primarily the SpeakLeash dataset—an open repository of verified Polish texts, including Wikipedia, Polish parliamentary records, Polish literature, and other publicly accessible multilingual repositories such as SlimPajama. The model performs natural language processing (NLP) tasks, including text generation, sentiment analysis, and question answering, specifically in Polish and English.
Region
emeaSector
academianon-profitScenario
pre-trainingStart Date
2024Location: Poland
Region
north_americaSector
academiaScenario
inference_and_insight_generationStart Date
2024Location: United States
In February 2023, Brazil's Federal Court of Accounts launched ChatTCU, which uses OpenAI's ChatGPT and data sourced from the Federal Court of Accounts system. It allows auditors to request a summary of a case document, pose technical questions about the TCU and court decisions, and provide administrative services.
Region
latin_america_and_the_caribbeanSector
public_sectorScenario
open-ended_explorationadaptationStart Date
2024Location: Brazil
Citymeetings.nyc is an independent initiative that uses LLMs to synthesize information from New York City Council meetings. It uses data from Legistar, an online platform where the government posts meetings summaries and agendas.
Region
north_americaSector
civic_techScenario
inference_and_insight_generationStart Date
2024Location: United States
The Data Science Campus of the United Kingdom's Office for National Statistics has developed ClassifAI, an experimental tool that uses large language models to organize text into categories (e.g. industry). It aims to improve upon existing classification methods by offering greater flexibility and potentially higher accuracy for tasks such as categorizing labor market survey responses. The code has been released as open-source. The developers note that further assessment is needed before potential use in official statistics production.
Region
internationalSector
public_sectorScenario
inference_and_insight_generationStart Date
2024Location: United Kingdom
Trained on satellite imagery and earth observation data, Clay is a generative AI foundation model designed to understand and analyze Earth's surface. It can generate mathematical representations of any location on Earth at any given time, which can be used for various tasks like creating land cover maps, detecting crop or burn scars, and tracking deforestation. The AI model is open source.
Region
internationalSector
non-profitScenario
open-ended_explorationStart Date
2024Location: United States
Region
emeaSector
academiaprivate_sectorScenario
inference_and_insight_generationStart Date
2024Location: France
Region
internationalSector
private_sectorScenario
inference_and_insight_generationStart Date
2024Location: United States
Region
north_americaSector
academiaScenario
adaptationStart Date
2024Location: United States
Developed by the DS-I Africa (a research program in the United States funded by the National Institutes of Health) and the University of KwaZulu-Natal, DataLaw.Bot is a generative AI chatbot launched in October 2024 for researchers from several countries across the African continent to use in assessing data sharing regulations for scientific research. The chatbot was adapted from ChatGPT with national level data sharing regulations with the goal of increasing access to research data across the continent.
Region
emeaSector
public_sectoracademiaScenario
adaptationStart Date
2024Location: Botswana, Cameroon, Ghana, Kenya, Malawi, Nigeria, Rwanda, South Africa, Tanzania, The Gambia, Uganda, and Zimbabwe
The DC Compass AI assistant is a generative AI chat interface that provides answers to user queries based on datasets from Open Data DC. The interface can provide a summary of a dataset, supporting visualizations, graphs, and other maps. Currently, this project is a pilot program running a beta test open to the public. The team notes that the quality of the output is impacted by the quality of the data from Open Data DC as well as the breadth of data included.
Region
north_americaSector
public_sectorScenario
inference_and_insight_generationStart Date
2024Location: United States
Region
north_americaSector
public_sectoracademiaScenario
adaptationStart Date
2024Location: United States
Region
emeaSector
public_sectorScenario
inference_and_insight_generationStart Date
2024Location: Germany
Generative AI Chatbot for Drilling and Production integrates large language models (LLMs) with the Volve dataset, a publicly available dataset in the oil and gas industry. The Volve dataset was developed by Equinor, a Norwegian energy company. The chatbot is designed to analyze historical drilling and production reports, preform diagnostic analysis, generate structured query language (SQL), and provide recommendations for improving operations. The dataset is used to identify non-productive time, compare well performance, and diagnose root causes for poor-preforming wells. The chatbot uses machine learning for users to ask questions about the dataset, providing insights and analysis based on operational data.
Region
internationalSector
private_sectorScenario
inference_and_insight_generationStart Date
2024Location: United States
Developed by researchers at the Indraprastha Institute of Information Technology-Delhi, GeneSilico Copilot is a tool used to support oncologists. Drawing from data from Drugbank Open Data, FDA drug labels, RxList, Therapeutic Target Database, Drugs.com, and Wikipedia to offer advice on treatment decisions based on observed facts about a given patient.
Region
apacSector
academiaScenario
inference_and_insight_generationStart Date
2024Location: India
GeoLLM-Engine, developed by researchers at CoStrategist R&D Group and Microsoft Corporation, is an interface for interacting with geospatial data. The system includes a set of tools for analyzing maps and conducting spatial research. The development team is currently focused on improving the quality of outputs and refining the user interface. GeoLLM-Engine aims to serve professionals in fields that utilize geospatial analysis, such as urban planning and environmental monitoring.
Region
north_americaSector
private_sectorScenario
pre-trainingopen-ended_explorationStart Date
2024Location: United States
GoldCoin is a large language model developed for the legal domain by researchers at the Department of Computer Science and Engineering, HKUST, in Hong Kong SAR, China. It specializes in detecting violations of HIPAA privacy rules based on specific queries. The model was trained using legal data from Harvard University's Caselaw Access Project, which offers public access to United States legal decisions. The research team suggests that GoldCoin could potentially be adapted to address other privacy laws in the future.
Region
apacSector
academiaScenario
inference_and_insight_generationStart Date
2024Location: China
GovTech's Data Science and Artificial Intelligence Division (DSAID) has developed a system to assist in drafting parliamentary replies* using artificial intelligence. The project uses machine learning techniques to train language models on past parliamentary data, aiming to generate responses that match the style and accuracy of official replies. This tool is designed to help public servants in Singapore more efficiently prepare answers to parliamentary questions, while also exploring the broader potential of customized AI models for government applications. *Parliamentary replies are official answers given by government ministers or representatives to questions asked by members of parliament during legislative sessions.
Region
apacSector
public_sectorScenario
inference_and_insight_generationStart Date
2024Location: Singapore
The I14Y Interoperability Platform is Switzerlands national data catalogue, designed to improve access to data between authorities, businesses, and citizens. It provides a centralized repository for data collections, application interfaces, and government services from different levels of government. The platform offers services such as a searchable catalogue, concept definitions, news updates, and a handbook to support users in navigating and using Switzerland's data infrastructure.
Region
emeaSector
public_sectorScenario
data_augmentationStart Date
2024Location: Switzerland
The Indiana Office of Technology and Tyler Technologies (a technology firm), launched a beta version of an AI chatbot that aims to support the public in navigating public services. The chatbot is trained on public information from several departments within the State government and housed on the Government of Indiana website. Before opening the chatbot, there is a clause stating that the State will not be liable for any incorrect or misleading information from the chatbot.
Region
north_americaSector
public_sectorScenario
inference_and_insight_generationStart Date
2024Location: United States
IInkubaLM-0.4B aims to increase inclusivity and accessibility of AI for underrepresented language communities. InkubaLM-0.4B is a language model developed by Lelapa AI to support AI applications for African languages with limited digital resources. The model was trained using open datasets published on Zenodo, a publicly accessible repository for scientific and research data. Specifically, InkubaLM utilized datasets such as Inkuba-Mono and Inkuba-Instruct, covering languages like Hausa, Yoruba, Swahili, isiZulu, and isiXhosa. The model helps with tasks like text translation, sentiment analysis, and keyword recognition, aiming to make AI more inclusive and accessible for underrepresented language communities.
Region
emeaSector
private_sectorScenario
pre-trainingStart Date
2024Location: South Africa
Region
apacSector
academiaScenario
adaptationStart Date
2024Location: China
With the support of Indonesia Endowment Fund for Education (LPDP) of the Ministry of Finance of the Republic of Indonesia, researchers at the University of Nottingham developed KemenkeuGPT - a generative AI chatbot that aims to support policy makers within Indonesia's Ministry of Finance. The chatbot uses RAG and combines data from the Ministry of Finance, Statistics Indonesia, and the International Monetary Fund among other sources.
Region
apacSector
public_sectoracademiaScenario
adaptationStart Date
2024Location: Indonesia
Region
emeaSector
public_sectorScenario
adaptationStart Date
2024Location: France
Region
north_americaSector
academiaScenario
open-ended_explorationStart Date
2024Location: United States
Researchers at the University of Georgia and State University of New York at Albany used LLMs to analyze the transcripts of United States presidential debates. The team tested 7 debates from the last 24 years using GPT40 and Claude3. The team aimed to demonstrate how LLMs can be used to help minimize bias in judging.
Region
north_americaSector
academiaScenario
inference_and_insight_generationStart Date
2024Location: United States
Region
emeaSector
public_sectorScenario
adaptationStart Date
2024Location: Germany
LuminLab is an online platform that employs generative AI to offer information on improving building energy efficiency. The model is trained using open data from the Energy Performance Certificate dataset provided by the Sustainable Energy Authority of Ireland. The developers are currently working on enhancements, including the integration of geospatial data to generate 3D images of various areas, aiming to expand the platform's capabilities and visual representations.
Region
emeaSector
academiaScenario
adaptationStart Date
2024Location: Ireland
Microsoft and Planet collaborated with humanitarian organizations to analyze the impact of Hurricane Beryl in Grenada. This experimental tool uses the Microsoft AI for Good Damage Assessment Visualizer to analyze satellite images from Planet, estimating damage to buildings and structures on the island of Carriacou. The tool provides visual data to support frontline workers in disaster response and logistics.
Region
internationalSector
private_sectorScenario
inference_and_insight_generationStart Date
2024Location: Grenada
Region
north_americaSector
publicScenario
data_augmentationStart Date
2024Location: United States
OLMo 2 is an open-source generative AI language model developed by the Allen Institute for AI (AI2). It was trained primarily on data sources such as Wikipedia and Wikibooks via Dolma 1.7, academic papers from arXiv, and additional datasets like OpenWebMath and Algebraic Stack from ProofPile II. These sources that are available to the general public and researchers worldwide include web pages, code repositories, and academic content. OLMo 2 aims to support tasks like instruction-following, conversational AI, and text generation.
Region
internationalSector
academianon-profitScenario
pre-trainingStart Date
2024Location: United States
Researchers at Wangxuan Institute of Computer Technology at Peking University and the Computer Science Department of the University of California (Los Angeles) developed the Quantitative Reasoning with Data (QRData) Benchmark to assess LLM's ability to analyze statistical data. QRData includes data from open texts books, research papers, and other sources and is combined with 411 questions. Of the LLM's tested, GPT-4 performed the best, but the researchers noted the need for improvement.
Region
internationalSector
academiaScenario
pre-trainingStart Date
2024Location: United States, China
Queried is a research tool developed by Climate Policy Radar, a not-for-profit organization focused on advancing climate policy through open data and AI tools. Queried uses generative AI to assist users in analyzing climate law and policy documents. Using Large Language Models (LLMs), the tool allows users to query specific documents and receive responses based on the content. The tool is built on data from the Climate Change Laws of the World database, this database includes laws and policies on energy, transport, land use, climate resilience, and low-carbon transitions, and is continuously updated with data from official government websites and parliamentary records. Queried aims to help governments, researchers, and other stakeholders access and analyze climate policy documents, supporting their efforts to understand climate-related laws and policies.
Region
internationalSector
academianon-profitScenario
inference_and_insight_generationStart Date
2024Location: United Kingdom
Researchers in Taiwan experimented with using RAG to improve LLM's ability to answer queries about the Taiwanese Hakka culture. The team combined data from the Ministry of Education's Cultural Knowledge Base and Hakka Dictionary along with other data sources focused on languae and geographic locations. Through this effort, the team aimed to demonstrate the value of integrating a translation function in LLMs to support generative AI technologies that reflect minority cultures.
Region
apacSector
private_sectoracademiaScenario
adaptationStart Date
2024Location: Taiwan
SatGPT is a conversational tool developed by the United Nations Economic and Social Commission for Asia and the Pacific (ESCAP) that integrates generative AI with Earth observation data. SatGPT generates readable descriptions from satellite imagery, in efforts to make geospatial data more interpretable. Users can ask questions and receive insights for applications such as flood monitoring, agricultural assessments, and urban planning. SatGPT uses open data sources including ESA WorldCover 2020 for land cover classification, which identifies different types of surface coverage such as forests, croplands, and urban areas; Humanitarian Data Exchange (HDX) for administrative and humanitarian data; and Google Earth Engine for accessing global geospatial datasets like the Global Surface Water Mapping Layers from the European Union's Joint Research Centre (JRC).
Region
apacSector
multilateral_sectorScenario
inference_and_insight_generationStart Date
2024Location: Thailand
Region
emeaSector
academiaprivate_sectorScenario
pre-trainingStart Date
2024Location: France
ScholasticAI is a tool that uses retrieval-augmented generation (RAG) to help users extract and analyze information from documents, such as portable document format (PDFs). It allows users to upload their own files and generate responses based on the content within them, along with querying external knowledge databases. ScholasticAI is powered by the open-source Pleias-Pico language model. The model is trained on publicly available data which includes public domain books and newspapers in multiple languages. ScholasticAI is designed to support multiple languages and improve the accuracy of information by referencing and grounding its responses in the original sources.
Region
internationalSector
academiaScenario
inference_and_insight_generationStart Date
2024Location: United States
Region
north_americaSector
private_sectorScenario
inference_and_insight_generationStart Date
2024Location: United States
Region
emeaSector
public_sectoracademiaScenario
pre-trainingStart Date
2024Location: Switzerland
Region
apacSector
public_sectoracademiaScenario
data_augmentationStart Date
2024Location: Australia
Region
north_americaSector
public_sectorScenario
data_augmentationStart Date
2024Location: United States
The Virtual Intelligent Chat Assistant (VICA) is an online platfrom by Singapore's Government Technology Agency (GovTech) that public servants from across the government can use to create their own generative AI chatbots. In a blog published in Towards Data Science, representatives from GovTech discuss a proof of concept they developed using VICA for the Department of Statistics' Data. The team created a chatbot that could respond to queries about national statistics (such as GDP) in a table format.
Region
apacSector
public_sectorScenario
inference_and_insight_generationStart Date
2024Location: Singapore
Region
latin_america_and_the_caribbeanSector
non-profitScenario
inference_and_insight_generationStart Date
2024Location: Mexico
USAFacts processes and standardizes open government data from federal, state, and local sources. It uses generative AI specifically to create written content, such as summaries and explanations, based on official government data.
Region
north_americaSector
non-profitScenario
inference_and_insight_generationStart Date
2024Location: United States
Dolma is an open dataset created for the Allen Institute of AI made up of academic research along with other data sources such as books, website content, and code. The dataset currently hosts 3 trillion tokens and is accompanied by a toolkit on how to source datasets for training purposes.
Region
internationalSector
civic_technon-profitScenario
pre-trainingStart Date
2023Location: Global
Region
north_americaSector
academiaScenario
adaptationStart Date
2023Location: United States
The City of Helsinki has adapted general purpose LLMs to improve its civic services, including urban planning and public facilities. These generative AI tools are fine-tuned using open city data, such as zoning regulations and planning documents, to facilitate civic engagement. These tools aim to enable more efficient communication with residents while enhancing the accessibility of complex information.
Region
emeaSector
public_sectorScenario
adaptationStart Date
2023Location: Finland
ClimateQ&A is a generative AI chatbot developed from the ChatGPT API to provide responses to queries about climate change. The chatbot was created by Ekimetrics -- a data and AI firm based in France -- and uses data from reports from the Intergovernmental Panel on Climate Change (IPCC) and the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES). While its primary objective is to make climate change scientific information more accessible, it also helps to understand the types of questions people have about climate change. The team uses NLP to analyze these questions and identify where there are knowledge gaps.
Region
internationalSector
private_sectorScenario
inference_and_insight_generationStart Date
2023Location: France, Global
Region
north_americaSector
academiaScenario
adaptationStart Date
2023Location: United States
Region
north_americaSector
non-profitprivate_sectorScenario
open-ended_explorationStart Date
2023Location: Global
Dolma is a 3-trillion-token open dataset created for the Allen Institute of AI made up of academic research along with other data sources such as web pages, academic publications, code, books, and encyclopedic materials. The dataset is accompanied by a toolkit on how to source datasets for training purposes and aims to support transparency, risk mitigation, and reproducibility for responsible AI development.
Region
internationalSector
academiaScenario
pre-trainingStart Date
2023Location: United States
Region
apacSector
private_sectoracademiaScenario
adaptationStart Date
2023Location: Hong Kong
Representatives from Digital Green India (a NGO) and Microsoft Research (India) developed a generative AI chatbot for agricultural services. The chatbot provides farmers with text, audio, and video responses to queries about agriculture. The chatbot uses RAG and draws on research papers and other data sources. It has been implemented in Kenya, India, Ethiopia, and Nigeria thus far.
Region
internationalSector
private_sectorScenario
adaptationStart Date
2023Location: Kenya, India, Ethiopia, and Nigeria
Region
north_americaSector
public_sectorScenario
data_augmentationStart Date
2023Location: United States
GenSpectrum is a generative AI chatbot for COVID-19 genomic sequencing data from the GISAID Data Science Initiative (an initiative focused on generating access to data related to pathogens through partnerships). The chatbot was developed by researchers at the Department of Biosystems Science and Engineering, ETH Zürich and the Swiss Institute of Bioinformatics. The team aims to support research in the medical domain. The chatbot is not yet available online.
Region
internationalSector
academianon-profitScenario
inference_and_insight_generationStart Date
2023Location: Switzerland
GPT-SW3 is an open-source generative AI model collaboratively developed by AI Sweden, RISE, and Wallenberg AI, Autonomous Systems, and Software Programs (WASP WARA). It was trained on datasets, including Wikipedia, Wikimedia, and the Norwegian Colossal Corpus—an open dataset comprising texts from government publications, parliamentary records, newspapers, literature, and public reports. GPT-SW3 is designed to perform natural language processing tasks specifically for Nordic languages such as Swedish, Norwegian, Danish, and Icelandic, including content generation, translation, and digital assistant functions.
Region
emeaSector
public_sectoracademianon-profitScenario
pre-trainingStart Date
2023Location: Sweden
Jugalbandi is a generative AI-powered language translation tool that improves access to government programs and rights information across India. It leverages open government data related to various welfare schemes and services, using generative AI models to provide accurate translations in multiple local languages. The AI facilitates communication between citizens and the government, helping individuals understand and access services regardless of language barriers. This initiative democratizes access to official data and government resources, promoting inclusion in public services.
Region
apacSector
public_sectorScenario
inference_and_insight_generationStart Date
2023Location: India
Region
north_americaSector
academianon-profitScenario
adaptationStart Date
2023Location: United States
Region
north_americaSector
privateScenario
adaptationStart Date
2023Location: United States
Region
apacSector
academiaScenario
pre-trainingStart Date
2023Location: India
Region
north_americaSector
academiaScenario
open-ended_explorationStart Date
2023Location: United States
Region
emeaSector
academiaScenario
pre-trainingadaptationStart Date
2023Location: Germany
Region
emeaSector
civic_techScenario
open-ended_explorationStart Date
2023Location: Germany
Region
north_americaSector
privateScenario
data_augmentationStart Date
2023Location: United States
SEA-LION is a family of open-source large language models developed by AI Singapore as part of the National Multi-Modal Large Language Model project. Trained on multilingual datasets from Southeast Asia, SEA-LION supports low-resource languages like Thai, Vietnamese, and Bahasa Indonesia. The models aim to improve cultural representation in AI and enhance accessibility for multilingual natural language processing (NLP) tasks, including translation, summarization, and question answering.
Region
apacSector
academiaScenario
pre-trainingStart Date
2023Location: Singapore
Region
apacSector
public_sectorScenario
inference_and_insight_generationStart Date
2023Location: Singapore
To help improve the accessibility and usability of their open data platform, the International Monetary Fund (IMF) is prototyping a new generative AI tool that they are calling StatGPT. StatGPT will act as a user interface that processes natural language requests to find relevant datasets from the IMF’s repository. StatGPT will help users find indicators, visualize data in tables and charts, and generate Python code for analysis. The team is currently developing interface features and will then seek to integrate it in Excel.
Region
internationalSector
multilateral_sectorScenario
inference_and_insight_generationStart Date
2023Location: Europe and North America
Region
north_americaSector
public_sectorScenario
data_augmentationStart Date
2023Location: Canada
Region
internationalSector
non-profitScenario
open-ended_explorationStart Date
2023Location: United States
Region
north_americaSector
civic_techScenario
inference_and_insight_generationStart Date
2023Location: Canada
Region
emeaSector
private_sectorScenario
inference_and_insight_generationStart Date
2023Location: Spain
Region
north_americaSector
public_sectorScenario
pre-trainingStart Date
2023Location: United States
Wobby is a generative AI-powered interface that can answer queries related to a specific open datasets and produce summaries of those datasets and visualizations as responses. The platform is focused primarily on democratizing access to open government data, and currently hosts datasets from organizations like Statbel (Belgium’s national statistical office), Statistics Netherlands and Eurostat, as well as data from intergovernmental organizations like the World Bank. Wobby's last update allows for automatic data updates and real-time analysis based on current information.
Region
emeaSector
privateScenario
inference_and_insight_generationStart Date
2023Location: Belgium
Region
north_americaSector
civic_techScenario
adaptationStart Date
2022Location: United States
Region
north_americaSector
private_sectoracademiaScenario
inference_and_insight_generationStart Date
2022Location: United States
BLOOM is an open-access, multilingual large language model (LLM) trained using a mix of publicly available datasets, including community-selected data and filtered web-crawled data. Its training corpus, known as ROOTS, includes open data from sources like Project Gutenberg, OpenSubtitles, and HAL (open-access scientific publications), as well as government data and open research repositories such as the Catalan Government Crawling and the United Nations Parallel Corpus. BLOOM is designed to generate human-like text in 46 languages and 13 programming languages, and it is available for use and further development by researchers and institutions worldwide.
Region
internationalSector
academiaScenario
pre-trainingStart Date
2022Location: Global
Region
emeaSector
civic_techScenario
inference_and_insight_generationStart Date
2022Location: Europe and North America
Region
emeaSector
public_sectorScenario
open-ended_explorationStart Date
2022Location: Europe and North America
Region
north_americaSector
private_sectorScenario
inference_and_insight_generationStart Date
2021Location: United States
UrbanSim is an open-source platform that uses generative AI to model urban growth and simulate land use, transportation, and demographic shifts. The platform integrates various datasets, including open-source data on land use, population demographics, and transportation infrastructure, to generate development scenarios that help city planners and researchers make informed decisions about urban growth. UrbanSim aids in visualizing the impacts of policy changes, transportation development, and housing strategies, offering a dynamic tool for sustainable urban planning. The project emphasizes the use of open research data from official sources to simulate realistic and adaptive urban environments.
Region
internationalSector
private_sectorScenario
adaptationStart Date
2021Location: Global
Region
north_americaSector
academiaScenario
adaptationStart Date
2020Location: United States
Region
emeaSector
public_sectorScenario
inference_and_insight_generationStart Date
2020Location: Europe and North America
Region
latin_america_and_the_caribbeanSector
public_sectorScenario
inference_and_insight_generationStart Date
2020Location: Mexico
Region
north_americaSector
academiaScenario
inference_and_insight_generationStart Date
2019Location: United States
Region
latin_america_and_the_caribbeanSector
public_sectorScenario
inference_and_insight_generationStart Date
2019Location: Argentina
Region
emeaSector
academiaScenario
adaptationStart Date
2019Location: United States
Region
internationalSector
privateScenario
data_augmentationStart Date
2019Location: United States
Virtual Singapore is a dynamic 3D digital twin model that leverages generative AI to simulate and analyze urban development scenarios. The platform integrates various open data sources, including satellite imagery, sensor data, and social media inputs, to create a real-time representation of the city. Using generative AI, the system generates scenarios for urban planning, infrastructure development, and emergency response planning. Virtual Singapore helps city planners visualize the impact of policy decisions, environmental changes, and demographic trends. The platform is built on open research data and open data from various governmental and institutional sources, supporting data-driven decision-making for sustainable urban growth.
Region
apacSector
public_sectorScenario
open-ended_explorationStart Date
2019Location: Singapore
Region
north_americaSector
private_sectorScenario
data_augmentationStart Date
2017Location: United States
Region
north_americaSector
non-profitacademiaScenario
pre-trainingStart Date
2014Location: United States