Łukasz Bala
supervisor: Włodzimierz Kasprzak
Deep learning gained significant interest after beating record in ImageNet competition in image classification accuracy with AlexNet architecture. Since then it has been applied to all different areas in computer vision such as image segmentation, object detection and image inpainting. However when it comes to video, the quality of predictions is much worse as compared to single images. It is associated usually with the fact that for tasks like classification require additional context that has to be taken into account (for example, interactions between humans and objects), whereas challenges like improving the quality of video requires smooth transition between frames. I’m going to present results from NTIRE 2020 competition where we used architecture based on U-Net and EDVR, along with explanation of basic concepts used in our work. Training procedure, our results as well as those of other teams that participated in the competition are also going to be shown. Moreover I’m going to talk about few-shot learning in action recognition, which is going to be my next step in the research during PhD studies. I’m going to focus on existing architectures and my ideas to improve them - particularly neural architecture search.