Application of an Automatic Plagiarism Detection System in a Large-scale Assessment of English Speaking Proficiency

Xinhao Wang; Keelan Evanini; Matthew Mulholland; Yao Qian; James V. Bruno

2019 ACL ACL 2019

Application of an Automatic Plagiarism Detection System in a Large-scale Assessment of English Speaking Proficiency

Abstract

AbstractThis study aims to build an automatic system for the detection of plagiarized spoken responses in the context of an assessment of English speaking proficiency for non-native speakers. Classification models were trained to distinguish between plagiarized and non-plagiarized responses with two different types of features: text-to-text content similarity measures, which are commonly used in the task of plagiarism detection for written documents, and speaking proficiency measures, which were specifically designed for spontaneous speech and extracted using an automated speech scoring system. The experiments were first conducted on a large data set drawn from an operational English proficiency assessment across multiple years, and the best classifier on this heavily imbalanced data set resulted in an F1-score of 0.761 on the plagiarized class. This system was then validated on operational responses collected from a single administration of the assessment and achieved a recall of 0.897. The results indicate that the proposed system can potentially be used to improve the validity of both human and automated assessment of non-native spoken English.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing and Speech & Audio

🧭 Keyword Pioneer — content similarity

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xinhao Wang , Keelan Evanini , Matthew Mulholland , Yao Qian , James V. Bruno

Topics

Machine Learning > Core Methods > Classification Machine Learning > Core Methods > Metric Learning Natural Language Processing > Applications > Text Classification Speech & Audio > Analysis > Speech Analysis

Keywords

text classification imbalanced classification spoken language speech assessment plagiarism detection speech scoring content similarity

Download PDF

Related papers

What do phone embeddings learn about Phonology? 2019

Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages 2019

Understanding Undesirable Word Embedding Associations 2019

Inferential Machine Comprehension: Answering Questions by Recursively Deducing the Evidence Chain from Text 2019

Domain Adaptation of Neural Machine Translation by Lexicon Induction 2019