MikoĊaj Pudo
supervisor: Artur Janicki
This talk will present main research areas of the author, which include audio signal processing and automatic speech recognition (ASR). Both tasks rely strongly on machine learning techniques and especially on deep neural networks. The main goal is to optimize models used in the abovementioned tasks for embedded systems such as mobile phones. This assumption brings multiple constraints, such as: limited amount of memory and compute power, necessity to process data in an on-line manner.
Additionally end-of-speech (EOS) detection will be presented as an exemplary task. In this case an improved method of model training is proposed. This method can be used for all model types, which rely on binary cross-entropy during the training. The novel method was confronted with the loss function previously used. Experiments performed on clean data as well as on data containing background noise showed that the proposed method is significantly more robust to noisy and far-field environments compared to the baseline solution.