2018 COLING COLING 2018

Linguistic Resources for Phrasal Verb Identification

Abstract

AbstractThis paper shows how a Lexicon-Grammar dictionary of English phrasal verbs (PV) can be transformed into an electronic dictionary, and with the help of multiple grammars, dictionaries, and filters within the linguistic development environment, NooJ, how to accurately identify PV in large corpora. The NooJ program is an alternative to statistical methods commonly used in NLP: all PV are listed in a dictionary and then located by means of a PV grammar in both continuous and discontinuous format. Results are then refined with a series of dictionaries, disambiguating grammars, and other linguistics recourses. The main advantage of such a program is that all PV can be identified in any corpus. The only drawback is that PV not listed in the dictionary (e.g., archaic forms, recent neologisms) are not identified; however, new PV can easily be added to the electronic dictionary, which is freely available to all.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization
🧭 Keyword Pioneer — dictionary-based method
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio

Authors