Hengam: An Adversarially Trained Transformer for Persian Temporal Tagging

Sajad Mirzababaei; Amir Hossein Kargaran; Hinrich Schütze; Ehsaneddin Asgari

2022 AACL AACL 2022

Hengam: An Adversarially Trained Transformer for Persian Temporal Tagging

Abstract

AbstractMany NLP main tasks benefit from an accurate understanding of temporal expressions, e.g., text summarization, question answering, and information retrieval. This paper introduces Hengam, an adversarially trained transformer for Persian temporal tagging outperforming state-of-the-art approaches on a diverse and manually created dataset. We create Hengam in the following concrete steps: (1) we develop HengamTagger, an extensible rule-based tool that can extract temporal expressions from a set of diverse language-specific patterns for any language of interest. (2) We apply HengamTagger to annotate temporal tags in a large and diverse Persian text collection (covering both formal and informal contexts) to be used as weakly labeled data. (3) We introduce an adversarially trained transformer model on HengamCorpus that can generalize over the HengamTagger’s rules. We create HengamGold, the first high-quality gold standard for Persian temporal tagging. Our trained adversarial HengamTransformer not only achieves the best performance in terms of the F1-score (a type F1-Score of 95.42 and a partial F1-Score of 91.60) but also successfully deals with language ambiguities and incorrect spellings. Our code, data, and models are publicly available at https://github.com/kargaranamir/Hengam.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐣 Hot Topic Early Bird — persian language

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Sajad Mirzababaei , Amir Hossein Kargaran , Hinrich Schütze , Ehsaneddin Asgari

Topics

Machine Learning > Learning Types > Adversarial Learning Natural Language Processing > Understanding > Named Entity Recognition Natural Language Processing > Resources & Methods > Multilingual NLP

Keywords

weakly supervised learning named entity recognition adversarial training temporal tagging persian language

Download PDF

Related papers

A Japanese Corpus of Many Specialized Domains for Word Segmentation and Part-of-Speech Tagging 2022

Enhancing Tabular Reasoning with Pattern Exploiting Training 2022

Re-contextualizing Fairness in NLP: The Case of India 2022

Adversarially Improving NMT Robustness to ASR Errors with Confusion Sets 2022

Promoting Pre-trained LM with Linguistic Features on Automatic Readability Assessment 2022