Contrastive Learning for Supervised Language Model fine-tuning

Witold Sosnowski

supervisor: Piotr Gawrysiak



Natural language processing (NLP) is a rapidly growing area of machine learning with applications wherever a computer needs to operate on a text that involves capturing its semantics. It may include text classification, translation, abstraction, question answering, dialogues. All these tasks are upstream and depend on the quality of the text representation. Many models can produce such text representations, from Bag-Of-Word or Word2Vec word embedding to the state-of-the-art language representation model BERT with variations in most NLP tasks. The best performance is obtained when the model is first trained on a general knowledge corpus to capture semantic relationships between words and then fine-tuned a domain corpus with cross-entropy loss. Driven by the intuition that good generalization requires capturing the similarity between examples in one class and contrasting them with examples in other classes, we propose a supervised contrastive learning objective for the fine-tuning stage. It transforms the embedding space so that points from the same class can form separable subspaces, stabilizing and generalizing the language model fine-tuning process. Our new loss function can improve the model fine-tuning process by making it more generalizable and robust to noise in the training data.