Leaderboard
Speech recognition
Description
This leaderboard is prepared for the evaluation of automated speech recognition in Slovene. Its goal is to enable independent evaluation of language technologies tools for the Slovenian language.
The evaluation data includes the recordings and speakers which are, according to our knowledge, not present in the available speech databases for the Slovenian language. The data is structured as follows:
- 15 recordings in total duration 3h 18min 28sec (3:18:28)
- public speech in total duration 2:08:35sec and private speech in total duration 1:09:53
- 9 recordings in total duration 2:03:04 from south-western part of Slovenia and 6 recordings in total duration 1:15:24 from north-eastern part of Slovenia
- 18 male speakers and 19 female speakers
- public speech includes topics evolution, description of a settlement, scientific slam, description of a life, culture of speech, news, books, energetics, and the private speech includes 4 monologues and 3 dialogues between two persons
- in private speech, 10 speakers are recorded, 3 of them are up to 30 years old, 5 of them are between 30 and 49 years old and 2 of them are over 50 years old
Corpus was already pre-processed according to the rules of the RSDO project (2020-2023). Audio files are converted to 16kHz mono with durations between 0.398s and 64.065s.
NOTE: Due to large test dataset (unlabeled) it might take some time (cca. 60 seconds) before the file is downloaded. Please do not click download multiple times.
Related Leaderboards
Leaderboard Submissions
Rank | Title | Authors and Affiliations | 1 - WER | CER | WER | MER | WIL | WIP | Tag |
---|---|---|---|---|---|---|---|---|---|