2021
EMNLP
EMNLP 2021
Coping with Noisy Training Data Labels in Paraphrase Detection
Abstract
AbstractWe present new state-of-the-art benchmarks for paraphrase detection on all six languages in the Opusparcus sentential paraphrase corpus: English, Finnish, French, German, Russian, and Swedish. We reach these baselines by fine-tuning BERT. The best results are achieved on smaller and cleaner subsets of the training sets than was observed in previous research. Additionally, we study a translation-based approach that is competitive for the languages with more limited and noisier training data.
🌉
Interdisciplinary Bridge
— Deep Learning and Machine Learning and Natural Language Processing
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Deep Learning > Architectures > Transformers
Natural Language Processing > Applications > Text Classification
Natural Language Processing > Resources & Methods > Multilingual NLP
Machine Learning > Learning Types > Classification
Machine Learning > Learning Types > Fine-Tuning
Natural Language Processing > Applications > Semantic Analysis