2021
EACL
EACL 2021
Finite-state script normalization and processing utilities: The Nisaba Brahmic library
Abstract
AbstractThis paper presents an open-source library for efficient low-level processing of ten major South Asian Brahmic scripts. The library provides a flexible and extensible framework for supporting crucial operations on Brahmic scripts, such as NFC, visual normalization, reversible transliteration, and validity checks, implemented in Python within a finite-state transducer formalism. We survey some common Brahmic script issues that may adversely affect the performance of downstream NLP tasks, and provide the rationale for finite-state design and system implementation details.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Natural Language Processing
🧭
Keyword Pioneer
— script normalization
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio