A comprehensive multimodal recommender system for fashion ecommerce

Mikołaj Wieczorek

supervisor: dr hab. inż. Grzegorz Pastuszak



This Industrial PhD thesis, conducted in cooperation with Synerise S.A., focuses on creating a comprehensive mutltimodal recommender system for fashion industry.

Such a system consists of three main modules: 1) Visual Recommendations - visual similarity needs to be found between a viewed product and other products in the catalogue; 2) Visual Search - visual similarity between user taken/uploaded photo and products in the catalogue; 3) Outfit Recommendation – an outfit recommendation that consists of complementary and matching garments, based on the user’s purchase history and general sense of ’fashionability’.

The system relies heavily on Computer Vision methods to encode and ‘understand’ images, as clothes are best assesed based on their look. However, the look itself may not suffice, therefore, the system needs additional data about the product and the user such as history of transaction, clicked products etc.. Images, text and behavioural data makes the system multimodal and best suited to serve personalized recommendations to users.

There are two main research challenges 1) fusion mechanism to combine information from visual appearance, textual data and user behaviour; 2) preparing data and training schema to train a model a notion of ‘fashionability’ and ‘compatibilty’.

During the first year, two papers were publicised with the state-of-the-art results in a fashion retrieval task. Currently, some of the models created in the first phase are being used by Synerise.