2021
EMNLP
EMNLP 2021
Machine Extraction of Tax Laws from Legislative Texts
Abstract
AbstractUsing a corpus of compiled codes from U.S. states containing labeled tax law sections, we train text classifiers to automatically tag tax-law documents and, further, to identify the associated revenue source (e.g. income, property, or sales). After evaluating classifier performance in held-out test data, we apply them to an historical corpus of U.S. state legislation to extract the flow of relevant laws over the years 1910 through 2010. We document that the classifiers are effective in the historical corpus, for example by automatically detecting establishments of state personal income taxes. The trained models with replication code are published at https://github.com/luyang521/tax-classification.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— tax law extraction
🐣
Hot Topic Early Bird
— legal text
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Natural Language Processing > Applications > Information Extraction
Natural Language Processing > Applications > Text Classification
Machine Learning > Learning Types > Transfer Learning
Artificial Intelligence > Core AI > Natural Language Processing
Machine Learning > Core Methods > Information Extraction