DRAGON at FIGNEWS 2024 Shared Task: a Dedicated RAG for October 7th conflict News

Sadegh Jafari; Mohsen Mahmoodzadeh; Vanooshe Nazari; Razieh Bahmanyar; Kathryn Burrows

2024 ACL ACL 2024

DRAGON at FIGNEWS 2024 Shared Task: a Dedicated RAG for October 7th conflict News

Abstract

AbstractIn this study, we present a novel approach to annotating bias and propaganda in social media data by leveraging topic modeling techniques. Utilizing the BERTopic tool, we performed topic modeling on the FIGNEWS Shared-task dataset, which initially comprised 13,500 samples. From this dataset, we identified 35 distinct topics and selected approximately 50 representative samples from each topic, resulting in a subset of 1,812 samples. These selected samples were meticulously annotated for bias and propaganda labels. Subsequently, we employed multiple methods like KNN, SVC, XGBoost, and RAG to develop a classifier capable of detecting bias and propaganda within social media content. Our approach demonstrates the efficacy of using topic modeling for efficient data subset selection and provides a robust foundation for improving the accuracy of bias and propaganda detection in large-scale social media datasets.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Sadegh Jafari , Mohsen Mahmoodzadeh , Vanooshe Nazari , Razieh Bahmanyar , Kathryn Burrows

Topics

Machine Learning > Core Methods > Classification Machine Learning > Core Methods > Clustering Natural Language Processing > Applications > Text Classification

Keywords

text classification social media analysis topic modeling bias detection retrieval-augmented generation propaganda detection

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024