Google launches WAXAL voice dataset to support AI in African languages

Google has launched WAXAL, a large open-access voice dataset designed to accelerate the development of artificial intelligence tools for Sub-Saharan African languages and help preserve linguistic diversity online, the company said.

The database contains more than eleven thousand hours of audio spanning twenty-one languages, including Yoruba, Acholi, Hausa, Luganda, Malagasy and Shona. It draws on nearly two million individual recordings, making it one of the largest publicly available speech resources focused on African languages.

Google said the project was built to reflect how people naturally speak. Participants were asked to describe images in their native languages, capturing everyday speech patterns, while professional voice actors were recorded in studios to generate high-quality material suitable for text-to-speech applications.

WAXAL includes around one thousand two hundred and fifty hours of transcribed speech for automatic speech recognition systems, as well as more than twenty hours of studio-quality audio designed specifically for text-to-speech synthesis. The dataset is available under an open licence on the Hugging Face platform, allowing researchers, startups and developers free access.

The initiative was developed in collaboration with African institutions, including Makerere University in Uganda, the University of Ghana and Digital Umuganda in Rwanda. Google said working with local partners was essential to ensuring linguistic accuracy, cultural relevance and ethical data collection.

The launch comes as technology companies and researchers seek to address a major imbalance in global AI development. While Africa is home to an estimated one thousand five hundred to three thousand languages, according to UNESCO, most digital voice tools support only a small fraction of them. The lack of high-quality, locally sourced data has slowed the deployment of voice assistants, educational technologies and automated transcription services across much of the continent.

By making WAXAL openly available, Google said it aims to lower barriers to entry for innovators building voice-driven products for African users, from customer service bots and accessibility tools to educational and health applications. The company also framed the project as a contribution to safeguarding African languages in the digital space, where underrepresentation raises the risk of long-term erosion.

Local initiatives are also emerging to fill the data gap. In Benin, the “JaimeMaLangue” project encourages citizens to contribute voice samples to a national database. Other efforts, including African Voices in Nigeria and African Next Voices in Mali, are working to expand speech resources for languages that have historically been excluded from mainstream technology platforms.

Google said WAXAL is intended to complement these initiatives rather than replace them, and that broader collaboration will be key to ensuring Africa’s languages are not only preserved but actively shape the next generation of artificial intelligence systems.

Share This Article
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *