2018 IJCAI IJCAI 2018

Positive and Unlabeled Learning for Detecting Software Functional Clones with Adversarial Training

Abstract

Software clone detection is an important problem for software maintenance and evolution and it has attracted lots of attentions. However, existing approaches ignore a fact that people would label the pairs of code fragments as \emph{clone} only if they happen to discover the clones while a huge number of undiscovered clone pairs and non-clone pairs are left unlabeled. In this paper, we argue that the clone detection task in the real-world should be formalized as a Positive-Unlabeled (PU) learning problem, and address this problem by proposing a novel positive and unlabeled learning approach, namely CDPU, to effectively detect software functional clones, i.e., pieces of codes with similar functionality but differing in both syntactical and lexical level, where adversarial training is employed to improve the robustness of the learned model to those non-clone pairs that look extremely similar but behave differently. Experiments on software clone detection benchmarks indicate that the proposed approach together with adversarial training outperforms the state-of-the-art approaches for software functional clone detection.

🧭 Keyword Pioneer — code similarity
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors