2003 JMLR JMLR 2003

Learning Semantic Lexicons from a Part-of-Speech and Semantically Tagged Corpus Using Inductive Logic Programming

Abstract

This paper describes an inductive logic programming learning method designed to acquire from a corpus specific Noun-Verb (N-V) pairs---relevant in information retrieval applications to perform index expansion---in order to build up semantic lexicons based on Pustejovsky's generative lexicon (GL) principles (Pustejovsky, 1995). In one of the components of this lexical model, called the qualia structure , words are described in terms of semantic roles. For example, the telic role indicates the purpose or function of an item ( cut for knife ), the agentive role its creation mode ( build for house ), etc. The qualia structure of a noun is mainly made up of verbal associations, encoding relational information. The learning method enables us to automatically extract, from a morpho-syntactically and semantically tagged corpus, N-V pairs whose elements are linked by one of the semantic relations defined in the qualia structure in GL. It also infers rules explaining what in the surrounding context distinguishes such pairs from others also found in sentences of the corpus but which are not relevant. Stress is put here on the learning efficiency that is required to be able to deal with all the available contextual information, and to produce linguistically meaningful rules. [abs] [ pdf ][ ps.gz ][ ps ]

🌱 Topic Pioneer — Semi-Supervised Learning
🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing
📈 Trend Setter — Semi-Supervised Learning
🧭 Keyword Pioneer — semantic lexicon
🐣 Hot Topic Early Bird — information retrieval
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio