SupCL-Seq: Supervised Contrastive Learning for Downstream Optimized Sequence Representations

Hooman Sedghamiz; Shivam Raval; Enrico Santus; Tuka Alhanai; Mohammad Ghassemi

2021 EMNLP EMNLP 2021

SupCL-Seq: Supervised Contrastive Learning for Downstream Optimized Sequence Representations

Abstract

AbstractWhile contrastive learning is proven to be an effective training strategy in computer vision, Natural Language Processing (NLP) is only recently adopting it as a self-supervised alternative to Masked Language Modeling (MLM) for improving sequence representations. This paper introduces SupCL-Seq, which extends the supervised contrastive learning from computer vision to the optimization of sequence representations in NLP. By altering the dropout mask probability in standard Transformer architectures (e.g. BERT-base), for every representation (anchor), we generate augmented altered views. A supervised contrastive loss is then utilized to maximize the system’s capability of pulling together similar samples (e.g., anchors and their altered views) and pushing apart the samples belonging to the other classes. Despite its simplicity, SupCL-Seq leads to large gains in many sequence classification tasks on the GLUE benchmark compared to a standard BERT-base, including 6% absolute improvement on CoLA, 5.4% on MRPC, 4.7% on RTE and 2.6% on STS-B. We also show consistent gains over self-supervised contrastively learned representations, especially in non-semantic tasks. Finally we show that these gains are not solely due to augmentation, but rather to a downstream optimized sequence representation.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — downstream optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Hooman Sedghamiz , Shivam Raval , Enrico Santus , Tuka Alhanai , Mohammad Ghassemi

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Contrastive Learning Natural Language Processing > Applications > Text Classification Deep Learning > Learning Types > Contrastive Learning Deep Learning > Models > Transformers

Keywords

text classification sequence representation supervised contrastive learning masked language modeling downstream optimization

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021