Bielik 7B v0.1 is a generative AI model developed collaboratively by SpeakLeash and the ACK Cyfronet AGH computing center in Poland. It was trained on publicly available, official open datasets, primarily the SpeakLeash dataset—an open repository of verified Polish texts, including Wikipedia, Polish parliamentary records, Polish literature, and other publicly accessible multilingual repositories such as SlimPajama. The model performs natural language processing (NLP) tasks, including text generation, sentiment analysis, and question answering, specifically in Polish and English.
A growing observatory of examples of how open data from official sources and generative artificial intelligence (AI) are intersecting across domains and geographies.
Share your project for inclusion. We seek to learn from generative AI initiatives that use open government and research data across a Spectrum of Scenarios. More information on each scenario can be found in our report: A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI.