Marcin SowaĆski
supervisor: Artur Janicki
Natural language understanding systems, such as multilingual virtual assistants, are created from large amounts of text data in a given language. In the process of adding another language to such systems language resources needs to be either translated or localized. This process usually involves hiring many language experts for this purpose which is expensive both in terms of time and money. Additionally, when adding new functionalities to the system we also have to take into account maintenance and consistency between languages, which generates additional costs related to communication and project management.
As part of my PhD thesis, research is being conducted to determine the methods and tools needed to create machine translation models that will be used to automatically translate language resources of multilingual virtual agents. The translation of language resources must meet the language characteristics of the assistant, such as the domain language and semantic annotations. The challenge is to translate sentences that will take into account the knowledge and domain vocabulary of the virtual assistant. Furthermore, in order to minimize costs, effective ways of automatically determining the correctness of translated sentences must also be developed.
During the seminar I will present the results of my work, including resources created and presented in the article 'Leyzer: A Dataset for Multilingual Virtual Assistants' and also current research results and plans that focus on creating resources and methods for automatic evaluation of machine translation.