Extracting PAC Decision Trees from Black Box Binary Classifiers: The Gender Bias Study Case on BERT-based Language Models

Ana Ozaki; Roberto Confalonieri; Ricardo Guimarães; Anders Imenes

2025 AAAI AAAI 2025

Extracting PAC Decision Trees from Black Box Binary Classifiers: The Gender Bias Study Case on BERT-based Language Models

Abstract

Abstract Decision trees are a popular machine learning method, valued for their inherent explainability. In Explainable AI, decision trees serve as surrogate models for complex black box AI models or as approximations of parts of such models. A key challenge of this approach is assessing how accurately the extracted decision tree represents the original model and determining the extent to which it can be trusted as an approximation of its behaviour. In this work, we investigate the use of the Probably Approximately Correct (PAC) framework to provide a theoretical guarantee of fidelity for decision trees extracted from AI models. Leveraging the theoretical foundations of the PAC framework, we adapt a decision tree algorithm to ensure a PAC guarantee under specific conditions. We focus on binary classification and conduct experiments where we extract decision trees from BERT-based language models with PAC guarantees. Our results indicate occupational gender bias in these models, which confirm previous results in the literature. Additionally, the decision tree format enhances the visualization of which occupations are most impacted by social bias.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ana Ozaki , Roberto Confalonieri , Ricardo Guimarães , Anders Imenes

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Core Methods > Classification Machine Learning > Optimization & Theory > Theory

Keywords

pac learning explainable ai bert language model decision tree gender bia

Download PDF

Related papers

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing 2025

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation 2025

3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics 2025