Speech recognition in conditions of impaired acoustic signal transmission.

Karolina Pondel-Sycz

supervisor: Piotr Bilski



The research concerns the issue of speech recognition with distorted acoustic signal transmission. It concerns mainly telephone conversations, where signal distortions and interferences occur. The first step in research is to investigate the type of interference and distortion present and then to select appropriate repair methods. A wavelet transform can be used for cleaning, analysis of the processed signal and preliminary assessment of its quality. The next step in speech signal preprocessing is amplitude normalisation, use of a preemphasis filter, time alignment, etc. Once the signal has been properly prepared, feature extraction can proceed. Currently, in the field of ASR systems, it is particularly interesting to apply deep neural networks using cepstral features of the signal and MFCC analysis. Of particular note is the PNCC analysis, which, according to literature sources, gives promising results for speech recognition carried out in the presence of various types of additive noise, reverberant or noise environment. Convolutional neural networks, time-delay neural networks and recurrent neuron networks including Bidirectional Long Short-Term Memory layers, (which are able to exploit temporal context), can be particularly useful as feature extractors and classifiers. The research is conducted for speech in English and concerns keyword recognition.