2021
EMNLP
EMNLP 2021
A Corpus for Multilingual Analysis of Online Terms of Service
Abstract
AbstractWe present the first annotated corpus for multilingual analysis of potentially unfair clauses in online Terms of Service. The data set comprises a total of 100 contracts, obtained from 25 documents annotated in four different languages: English, German, Italian, and Polish. For each contract, potentially unfair clauses for the consumer are annotated, for nine different unfairness categories. We show how a simple yet efficient annotation projection technique based on sentence embeddings could be used to automatically transfer annotations across languages.
🌉
Interdisciplinary Bridge
— Computer Science and Natural Language Processing
🧭
Keyword Pioneer
— terms of service
🐣
Hot Topic Early Bird
— multilingual natural language processing
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio