A Benchmark Suite of Japanese Natural Questions

Takuya Uematsu; Hao Wang; Daisuke Kawahara; Tomohide Shibata

2024 NAACL NAACL 2024

A Benchmark Suite of Japanese Natural Questions

Abstract

AbstractTo develop high-performance and robust natural language processing (NLP) models, it is important to have various question answering (QA) datasets to train, evaluate, and analyze them. Although there are various QA datasets available in English, there are only a few QA datasets in other languages. We focus on Japanese, a language with only a few basic QA datasets, and aim to build a Japanese version of Natural Questions (NQ) consisting of questions that naturally arise from human information needs. We collect natural questions from query logs of a Japanese search engine and build the dataset using crowdsourcing. We construct Japanese Natural Questions (JNQ) and a Japanese version of BoolQ (JBoolQ), which is derived from NQ and consists of yes/no questions. JNQ consists of 16,871 questions, and JBoolQ consists of 6,467 questions. We also define two tasks from JNQ and one from JBoolQ and establish baselines using competitive methods drawn from related literature. We hope that these datasets will facilitate research on QA and NLP models in Japanese. We are planning to release JNQ and JBoolQ.

🧭 Keyword Pioneer — japanese natural language processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Takuya Uematsu , Hao Wang , Daisuke Kawahara , Tomohide Shibata

Topics

Natural Language Processing > Applications > Question Answering Natural Language Processing > Resources & Methods > Multilingual NLP

Keywords

multilingual nlp question answering benchmark dataset dataset benchmark crowdsourced dataset yes-no question japanese natural language processing natural questions dataset natural question japanese question answering

Download PDF

Related papers

Working Alliance Transformer for Psychotherapy Dialogue Classification 2024

Named Entity Recognition Under Domain Shift via Metric Learning for Life Sciences 2024

Assessing Logical Puzzle Solving in Large Language Models: Insights from a Minesweeper Case Study 2024

TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition in Conversation 2024

Extractive Summarization with Text Generator 2024