Shuffled-token Detection for Refining Pre-trained RoBERTa

Subhadarshi Panda; Anjali Agrawal; Jeewon Ha; Benjamin Bloch

2021 NAACL NAACL 2021

Shuffled-token Detection for Refining Pre-trained RoBERTa

Abstract

AbstractState-of-the-art transformer models have achieved robust performance on a variety of NLP tasks. Many of these approaches have employed domain agnostic pre-training tasks to train models that yield highly generalized sentence representations that can be fine-tuned for specific downstream tasks. We propose refining a pre-trained NLP model using the objective of detecting shuffled tokens. We use a sequential approach by starting with the pre-trained RoBERTa model and training it using our approach. Applying random shuffling strategy on the word-level, we found that our approach enables the RoBERTa model achieve better performance on 4 out of 7 GLUE tasks. Our results indicate that learning to detect shuffled tokens is a promising approach to learn more coherent sentence representations.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — shuffled-token detection

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Subhadarshi Panda , Anjali Agrawal , Jeewon Ha , Benjamin Bloch

Topics

Machine Learning > Learning Types > Self-Supervised Learning Natural Language Processing > Resources & Methods > Text Representation

Keywords

self-supervised learning transformer language model sentence representation learning shuffled-token detection pre-trained model refinement

Download PDF

Related papers

Knowledge Router: Learning Disentangled Representations for Knowledge Graphs 2021

Cross-Task Instance Representation Interactions and Label Dependencies for Joint Information Extraction with Graph Convolutional Networks 2021

Abstract Meaning Representation Guided Graph Encoding and Decoding for Joint Information Extraction 2021

Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing 2021

Probing Word Translations in the Transformer and Trading Decoder for Encoder Layers 2021