ARC Welding

Jan Sawicki

supervisor: Maria Ganzha



The subject of the doctoral dissertation is “ARC Welding” - Analysis of Reddit Communities Welding. The prime goal of the research is the process of finding similarities of online communities and producing meaningful connections between them, which may have possible applications in marketing and public relations. In order to achieve the goal, the process follows four main steps: dataset gathering, natural language processing, network construction and fusion (“welding”).

The research determined that the most suitable dataset is Reddit for its online community representation, topical categorization and free and easy accessibility. Its posts and their metadata will be mined for key user interests, using state-of-the-art NLP techniques (e.g. neural networks with attention, i.e. the transformers model like Bert, GPT). The artifacts such as named entities will be converted to real networks. The final product of the process is the “fusion” of the networks, that is finding the similarities between them. These are not only the commonly named entities but also their surroundings and community attitude towards them. The technology used in the project is exclusively Python to maintain an heterogenous and natively compatible environment.

The project is at the stage of finalizing literature review regarding Reddit and NLP and gathering the dataset. In the upcoming steps the dataset will be mined for its features and used to construct networks. The research proceeds according to schedule.