IInkubaLM-0.4B aims to increase inclusivity and accessibility of AI for underrepresented language communities. InkubaLM-0.4B is a language model developed by Lelapa AI to support AI applications for African languages with limited digital resources. The model was trained using open datasets published on Zenodo, a publicly accessible repository for scientific and research data. Specifically, InkubaLM utilized datasets such as Inkuba-Mono and Inkuba-Instruct, covering languages like Hausa, Yoruba, Swahili, isiZulu, and isiXhosa. The model helps with tasks like text translation, sentiment analysis, and keyword recognition, aiming to make AI more inclusive and accessible for underrepresented language communities.
A growing observatory of examples of how open data from official sources and generative artificial intelligence (AI) are intersecting across domains and geographies.
Share your project for inclusion. We seek to learn from generative AI initiatives that use open government and research data across a Spectrum of Scenarios. More information on each scenario can be found in our report: A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI.