Jan Sawicki
supervisor: prof. dr hab. Maria Ganzha
The research “ARC Welding” (Analysis of Reddit Communities Welding) focuses on enucleating topical inter-subfora similarities on Reddit.
The proposed method is designed based on an extensive Reddit and natural language processing (NLP) literature review. It is based on analysing real graph networks built with named entities from Reddit posts detected with neural network models based on text embedding and “transformers” architecture.
The main recent highlights of the research are:
1) The most significant discovery is that “crossposts” (posts from one subreddit posted to another one), because they contain partial response variables and are key to evaluation.
2) The dataset was extended with additional 200 subreddits with most crossposts and it was also migrated to 2021 to reduce COVID-19-related posts (total of over 1200 subreddits in timespan of 12 months)
3) Reviewed named entity linking and disambiguation methods are inapplicable due to being inaccurate, extremely long-running, softwarely deprecated or simply nonfunctional.
4) Modelling based on user profiling is currently impossible due to extremely large granularity of post authors and relatively small percent of users posting vs users watching/evaluating the content
5) Experiments with time granularity emphasised the rarity of crossposts and did not improve the results.
6) The best results are obtained using entity post score and entity node degree centrality. Joining merging methods are the current focus of the resea