2024
COLING
COLING 2024
Framed Multi30K: A Frame-Based Multimodal-Multilingual Dataset
Abstract
AbstractThis paper presents Framed Multi30K (FM30K), a novel frame-based Brazilian Portuguese multimodal-multilingual dataset which i) extends the Multi30K dataset (Elliot et al., 2016) with 158,915 original Brazilian Portuguese descriptions, and 30,104 Brazilian Portuguese translations from original English descriptions; and ii) adds 2,677,613 frame evocation labels to the 158,915 English descriptions and to the ones created for Brazilian Portuguese; (iii) extends the Flickr30k Entities dataset (Plummer et al., 2015) with 190,608 frames and Frame Elements correlations with the existing phrase-to-region correlations.
👥
Mega-Team
— 26 authors
🧭
Keyword Pioneer
— frame-based annotation
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Marcelo Viridiano
,
Arthur Lorenzi
,
Tiago Timponi Torrent
,
Ely E. Matos
,
Adriana S. Pagano
,
Natália Sathler Sigiliano
,
Maucha Gamonal
,
Helen de Andrade Abreu
,
Lívia Vicente Dutra
,
Mairon Samagaio
,
Mariane Carvalho
,
Franciany Campos
,
Gabrielly Azalim
,
Bruna Mazzei
,
Mateus Fonseca de Oliveira
,
Ana Carolina Luz
,
Lívia Pádua Ruiz
,
Júlia Bellei
,
Amanda Pestana
,
Josiane Costa
,
Iasmin Rabelo
,
Anna Beatriz Silva
,
Raquel Roza
,
Mariana Souza Mota
,
Igor Oliveira
,
Márcio Henrique Pelegrino de Freitas