Semi-Supervised Learning with Limited Data for Automatic Speech Recognition

MikoĊ‚aj Pudo

supervisor: Artur Janicki



We analyze the performance of semisupervised learning (SSL) methods for the automatic speech recognition (ASR) task. We focus on the case of model adaptation using small unlabeled datasets. The basic SSL method that we apply uses pseudo-labels generated by the adapted model itself, however, we also propose and analyze a number of improvements to SSL. Furthermore, we investigate the possibility of using these methods on the datasets with the token distributions significantly different from the one represented by the training data. We show that in certain conditions, even very small amounts of data can improve the ASR model performance. Using the proposed SSL variant, we were able to reduce WER by 12-22%, depending on the dataset.