MOSEL: Collection of Open Source Speech Data for Speech Foundation Model Training on EU Languages
While existing speech datasets are heavily skewed towards English, many EU languages are underserved in terms of accessible and high-quality speech data. This lack of resources leads to AI models…