BERTAC: Enhancing Transformer-based Language Models with Adversarially Pretrained Convolutional Neural Networks

Jong-Hoon Oh; Ryu Iida; Julien Kloetzer; Kentaro Torisawa

2021 ACL ACL 2021

BERTAC: Enhancing Transformer-based Language Models with Adversarially Pretrained Convolutional Neural Networks

Abstract

AbstractTransformer-based language models (TLMs), such as BERT, ALBERT and GPT-3, have shown strong performance in a wide range of NLP tasks and currently dominate the field of NLP. However, many researchers wonder whether these models can maintain their dominance forever. Of course, we do not have answers now, but, as an attempt to find better neural architectures and training schemes, we pretrain a simple CNN using a GAN-style learning scheme and Wikipedia data, and then integrate it with standard TLMs. We show that on the GLUE tasks, the combination of our pretrained CNN with ALBERT outperforms the original ALBERT and achieves a similar performance to that of SOTA. Furthermore, on open-domain QA (Quasar-T and SearchQA), the combination of the CNN with ALBERT or RoBERTa achieved stronger performance than SOTA and the original TLMs. We hope that this work provides a hint for developing a novel strong network architecture along with its training scheme. Our source code and models are available at https://github.com/nict-wisdom/bertac.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — model integration

🐣 Hot Topic Early Bird — language model pretraining

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jong-Hoon Oh , Ryu Iida , Julien Kloetzer , Kentaro Torisawa

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Adversarial Learning Machine Learning > Optimization & Theory > Neural Network Optimization Deep Learning > Architectures > Transformers Deep Learning > Techniques > Model Architecture Deep Learning > Techniques > Pretraining Natural Language Processing > Resources & Methods > Large Language Models

Keywords

knowledge distillation question answering convolutional neural network language model generative adversarial network transformer language model model integration language model pretraining transformer-based language model nlp task adversarial pretraining gan-style learning

Download PDF

Related papers

Out-of-Scope Intent Detection with Self-Supervision and Discriminative Training 2021

A Non-Autoregressive Edit-Based Approach to Controllable Text Simplification 2021

How Did This Get Funded?! Automatically Identifying Quirky Scientific Achievements 2021

Exploring Discourse Structures for Argument Impact Classification 2021

Language Embeddings for Typology and Cross-lingual Transfer Learning 2021