You Shall Know a Word’s Difficulty by the Family It Keeps: Word Family Features in Personalised Word Difficulty Classifiers for L2 Spanish
Abstract
AbstractDesigning vocabulary learning activities for foreign/second language (L2) learners highly depends on the successful identification of difficult words. In this paper, we present a novel personalised word difficulty classifier for L2 Spanish, using the LexComSpaL2 corpus as training data and a BiLSTM model as the architecture. We train a base version (using the original LexComSpaL2 data) and a word family version of the classifier (adding word family knowledge as an extra feature). The base version obtains reasonably good performance (F1 = 0.53) and shows weak positive predictive power (φ = 0.32), underlining the potential of automated methods in determining vocabulary difficulty for individual L2 learners. The “word family classifier” is able to further push performance (F1 = 0.62 and φ = 0.45), highlighting the value of well-chosen linguistic features in developing word difficulty classifiers.