Mateusz Klimaszewski
supervisor: Tomasz Gambin
We present COMBO - a flexible and language-independent NLP system for accurate part-of-speech tagging, morphological analysis, lemmatisation, dependency parsing, and thematic role labelling. COMBO is an easy to install Python package built on top of the AllenNLP platform and the PyTorch library. An inherent feature of the system is its flexibility. It allows to estimate model variants that differ in terms of the range of input features, the prediction scope, and the type of vector representation of input data, i.e. optional use of pre-trained (contextualised) word embeddings. COMBO is a language agnostic system that can be used to train a morphosyntactic prediction model for a dependency treebank in any language. Results of the evaluation on nine selected treebanks indicate that the prediction quality is comparable with the state-of-the-art system, Stanza. COMBO training is, however, much faster than training of the pipeline-based systems, because it is an end-to-end system with jointly trained prediction modules. COMBO was designed and developed in cooperation with The Linguistic Engineering Group at ICS PAS. Its source code and some pre-trained models are available at https://github.com/ipipan/combo.