Related Resources

Welcome to our project page dedicated to curating all relevant resources related to interpretability in deep learning and explainable AI. In today's world, AI systems and deep learning models are being used for a wide range of applications, ranging from autonomous driving and medical diagnosis to natural language processing and fraud detection. While these models have shown impressive performance in these applications, their black-box nature often raises concerns about their interpretability and transparency.

Our project aims to bridge this gap by curating all relevant resources related to interpretability in deep learning and explainable AI. This curated collection includes research papers, tutorials, code libraries, and datasets that are regularly updated to reflect the latest developments in the field. Our project page provides a one-stop-shop for researchers, developers, and practitioners who are interested in learning about or improving the interpretability and explainability of AI models.

Our team of experts is committed to maintaining the highest standards of quality in curating these resources. We carefully review each resource before including it in our collection to ensure that it is relevant, informative, and useful to our users. We also encourage our users to suggest new resources that they believe could be valuable additions to our collection.

In summary, our project page is a comprehensive and regularly updated resource hub for interpretability in deep learning and explainable AI. We aim to help the AI community build more transparent, interpretable, and trustworthy AI models that can be used for a wide range of applications.

Interpretability in Deep Learning

by Ayush Somani, Alexander Horsch, Dilip K. Prasad

Presents full coverage of interpretability in deep learning.
Explains the fundamental concepts of interpretability and the state of the art on the topic.
Includes fuzzy deep learning architectures.

This book is a comprehensive curation, exposition and illustrative discussion of recent research tools for interpretability of deep learning models, with a focus on neural network architectures. In addition, it includes several case studies from applicationoriented articles in the fields of computer vision, optics and machine learning related topic. The book can be used as a monograph on interpretability in deep learning covering the most recent topics as well as a textbook for graduate students. Scientists with research, development and application responsibilities benefit from its systematic exposition.

Github Repositories

These are some toolkits and code base currated from GitHub that shall provide a brief summary to help visitors understand the context and background of iDL advancement in large. Feel free to explore and suggest new projects.

Resource	Description
InterpretML	An open-source package that incorporates state-of-the-art machine learning interpretability techniques under one roof
Captum	A model interpretability library containing general purpose implementations of integrated gradients, saliency maps, smoothgrad, vargrad and others for PyTorch models.
DGTracker	Analyzing the training processes of deep generative models.
Deep Graph Library	A framework agnostic, scalable Python package for DL on graphs.
DeepExplain	Perturbation and gradient-based attribution methods for Deep Neural Net- works interpretability
DeepEyes	Progressive Visual Analytics for Designing DNNs.
ELI5	A library for debugging/inspecting machine learning classifiers and explaining their predictions.
GAN Lab	Using Interactive Visual Experimentation to Understand Complex Deep Generative Models.
GANViz	Understanding the Adversarial Game Through Visual Analytics.
Lucid	A collection of infrastructure and tools for research in neural network interpretability.
PAIR Saliency	Framework-agnostic implementation for state-of-the-art saliency methods
Probing ViTs	Self-attention visualization tool for different families of vision transformers.
QAConv	Method for person re-identification with matching directly in deep feature maps, making it more adaptable for handling various unseen scenarios.
SHAP	SHapley Additive exPlanations
SeFa	Closed-form factorization of latent space in GANs
Skater	Python Library for Model Interpretation/Explanations
Yellowbrick	Visual analysis and diagnostic tools to facilitate machine learning model selection.
iNNvestigate	A toolbox to iNNvestigate neural networks’ predictions

Methods

A summary of popular interpretability methods used for understanding Deep Learning models in recent years. With the increasing adoption of AI in various fields, it is crucial to understand how these models make decisions and the factors that influence their output.

Some popular methods include LIME (Local Interpretable Model-Agnostic Explanations), SHAP (SHapley Additive exPlanations), and Grad-CAM (Gradient-weighted Class Activation Mapping). These methods help to identify important features and patterns that contribute to the model's predictions.

However, it's important to note that interpretability methods are not a one-size-fits-all solution and should be chosen based on the specific context and use case. Nonetheless, these methods provide valuable insights into the decision-making process of DL models and can help build trust and accountability.

Have you used any of these interpretability methods?

Cluster

Methods

Decomposition Saliency

CAM (Class Activation Map) (Zhou et al. 2018)

Grad-CAM (Selvaraju et al. 2017)

Guided Grad-CAM and feature occlusion (Tang et al. 2019)

Score-weighted class activation mapping (Wang et al. 2019)

Smoothgrad (Smilkov et al. 2017)

Multi-layer CAM (Bahdanau et al. 2014)

LRP (Bach et al. 2015; Samek et al. 2016)

LRP on CNN and on BoW (bag of words)/SVM (Arras et al. 2017)

BiLRP (Eberle et al. 2020)

SDeepLIFT: learning important features (Shrikumar et al. 2017)

Slot activation vectors (Jacovi et al. 2018)

PRM (Peak Response Mapping) (Zhou et al. 2018)

Sensitivity Saliency

Saliency maps (Simonyan et al. 2014)

LIME (Local Interpretable Model-agnostic Explanations) (Ribeiro et al. 2016)

Guideline based Additive eXplanation optimizes complexity (Zhu and Ogino 2019)

Other Saliency

Attention map with autofocus convolutional layer (Qin et al. 2018)

Signal Inversion

Deconvolutional network (Noh et al. 2015; Zeiler and Fergus 2014)

Inverted image representations (Mahendran and Vedaldi 2015)

Inversion using CNN (Dosovitskiy et al. 2015)

Guided backpropagation (Springenberg et al. 2014; Izadyyazdanabadi et al. 2018)

Signal Optimization

Activation maximization (Olah et al. 2017)

Semantic dictionary (Olah et al. 2018)

Other Signals

Network dissection (Bau et al. 2017; Zhou et al. 2018)

Verbal Interpretability

Decision trees (Kotsiantis 2013)

Propositional logic, rule-based (Caruana et al. 2015)

Sparse decision list (Letham et al. 2015)

Rationalizing neural predictions (Lei et al. 2016)

MUSE (Model Understanding through Subspace Explanations)

(Lakkaraju et al. 2019)

Cluster

Methods

Pre-defined model

Linear Probe (Alain and Bengio 2016)

CNN Regression (Hatami et al. 2018)

GDM (Generative Discriminative Models): ridge regression + least

square (Varol et al. 2018)

GAM, GA2M (Generative Additive Model) (Hastie 2017)

ProtoAttend (Arik and Pfister 2019)

Group-driven RL (reinforcement learning) (Zhu et al. 2018)

Model-based RL (reinforcement learning) (Kaiser et al. 2019)

TS Approximation (Bede 2019)

Deep Tensor NN (Schütt et al. 2017)

Mathematical Feature

Extraction

PCA (Principle Components Analysis) (Dunteman 1989)

CCA (Canonical Correlation Analysis) (Hardoon et al. 2004)

Principal Feature Visualization (Bakken et al. 2020)

Eigen-CAM using principal components (Muhammad and Yeasin 2020)

GAN-based Multi-stage PCA (Goodfellow et al. 2014)

Estimating probability density with deep feature embedding (Krusinga et al. 2019)

t-SNE (t-Distributed Stochastic Neighbour Embedding) (Nguyen et al. 2016; Karpathy 2014)

Laplacian Eigenmaps visualization for Deep Generative Models (Biffi et al. 2018)

Group-based interpretable NN with RW-based graph embedding (Yan et al. 2019)

Mathematical Sensitivity

ST-DBSCAN (Birant and Kut 2007)

TCAV (Testing with Concept Activation Vector) (Kim et al. 2018)

ACE (Automatic Concept-based Explanations) uses TCAV (Ghorbani et al. 2019)

Influence function (Koh and Liang 2017)

SocRat (Structured-output Causual Rationalizer) (Alvarez-Melis and Jaakkola 2017)

Meta-predictors (Fong and Vedaldi 2017)

Cluster

Methods

Interpretable ML models

Linear regression (Montgomery et al. 2021)

Linear regression (Weisberg 2005; Montgomery et al. 2021)

Logistic regression (Kleinbaum et al. 2002)

Decision trees (Kotsiantis 2013)

GLMS (Venables and Dichmont 2004)

GAMs (Venables and Dichmont 2004)

Naive Bayes (Rish et al. 2001)

RuleFit (Friedman and Popescu 2008)

KNNs (Papernot et al. 2016)

Global model-agnostic methods

Partial dependence plot (PDP) (Goldstein et al. 2015)

Permuted feature importance (Altmann et al. 2010)

Accumulated local effect plot (Apley and Zhu 2020)

Prototypes and criticisms (Kim et al. 2016)

Global surrogate models (Tenne and Armfield 2009)

Local model-agnostic methods

ICE (Individual Conditional Expectation) (Goldstein et al. 2015)

LIME (Ribeiro et al. 2016)

SHAP (Slack et al. 2020)

Anchors (Ribeiro et al. 2018)

Counterfactual explanations (Wachter et al. 2017)

Others

ELI5 (Korobov and Lopuhin 2020)

Yellowbrick (Bengfort and Bilbro 2019)

MLXTEND (Raschka 2018)

Articles

Over the past few years, there has been a significant interest in interpretability in deep learning. Several survey papers have been published to summarize the state-of-the-art techniques and challenges in this field. For instance, a brief summary of some of these survey papers.

"Interpretability in Deep Learning: A Survey" by Liu et al. (2019)

This paper provides a comprehensive overview of interpretability methods in deep learning, including model-agnostic and model-specific approaches. It discusses the importance of interpretability and its applications in various fields such as healthcare, finance, and security. The paper also highlights the challenges and future directions in this field.

"A Survey on Deep Learning Interpretability: Techniques, Evaluation Criteria, and Challenges" by Doshi-Velez and Kim (2017)

This survey paper focuses on the challenges and evaluation criteria for deep learning interpretability methods. It discusses various techniques such as attribution methods, visualization, and post-hoc explanations. The paper also highlights the importance of benchmark datasets and evaluation metrics for interpretability methods.

"Interpretable Machine Learning: A Brief History, State-of-the-Art and Challenges" by Guidotti et al. (2019)

This paper provides a historical overview of interpretability in machine learning and its evolution to deep learning. It discusses various techniques such as decision trees, rule-based systems, and neural networks. The paper also highlights the challenges and future directions in interpretability for deep learning.

"Towards A Rigorous Science of Interpretable Machine Learning" by Lipton (2018)

This paper focuses on the need for a rigorous science of interpretable machine learning. It discusses the challenges in evaluating interpretability methods and proposes a framework for developing and evaluating these methods. The paper also highlights the importance of understanding the trade-off between interpretability and performance.

In conclusion, iDL is an important field that has gained significant attention in recent years. Survey papers such as those mentioned above provide a valuable overview of the state-of-the-art techniques and challenges in this field, and highlight the importance of interpretability in various applications. Here, we provide a summary table of some interpretability survey papers in recent years

Year	Authors	Summary	Paper Link
2021	Samek et al.	An extensive timely review on active emerging field and placing interpretability algorithms to test theoritically and with extensive simulations.	Link
2021	Mishra et al.	Survey of the works that analysed the robustness of two classes of local explanations (feature importance and counterfactual explanations) and discusses some interesting results.	Link
2021	Mohseni et al.	Categorization presents the mapping between design goals for different XAI user groups and their evaluation methods.Further, provide summarized ready-to-use tables of evaluation methods and recommendations for different goals in XAI research.	Link
2021	Vilone & Longo	Clustering all the scientific studies via a hierarchical system that classifies theories and notions related to the concept of explainability and the evaluation approaches for XAI methods and concludes by critically discussing these gaps and limitations.	Link
2021	Zhang et al.	Interpretability taxonomy for NNs in three dimensions comprising passive and active approach, type of explanation, and local vs global interpretability.	Link
2021	Fan et al.	Classifies NNs by their interpretability, discusses medical uses, and draws connections to fuzzy logic and neuroscience.	Link
2020	Arrieta et al.	Detailed explanations of key ideas and taxonomies are provided, and existing barriers to the development of explainable artificial intelligence (XAI) are highlighted.	Link
2020	Tjoa et al.	Collection of journal articles under perspective and mathematical interpretability and applying the same categorization towards medical research.	Link
2020	Vilone & Longo	Extensive clusters the XAI concept into different taxonomies, theories and evaluation approach. A collection of 361 papers and elaborative tables of classifying the explainability in AI.	Link
2020	Huang et al.	Discusses extensively on verification, testing, adversarial attacks on DL and interpretability techniques on 202 papers most published after 2017.	Link
2019	C. Molnar	A comprehensive overview of the different techniques and tools for interpreting machine learning models, including model-agnostic and model-specific methods.	Link
2019	Du et al.	It addresses 40 studies, which are broken down into categories such as global and local explanations, and coarse-grained post-hoc and ad-hoc explanations.	Link
2019	Holzinger et al.	Discusses the need for causability, a property of a person, in addition to explainability, a property of a system, to achieve a truly explainable medicine.	Link
2019	Mittelstadt et al.	Provides a short summary on explaining AI from the perspective of philosophy, sociology and human-computer interactions.	Link
2018	Verma et al.	A comprehensive overview of the different fairness definitions used in machine learning, including group fairness, individual fairness, and counterfactual fairness.	Link
2018	Giplin et al.	Classified the understanding the workflow and representation of NNs.	Link
2018	Melis et al.	Short survey of incapabilities of popular methods and direction towards self-explaining models based on explicitness, faithfullness, and stability.	Link
2018	Dhurandhar et al.	An information-theoretic framework for learning to explain the predictions of black box models.	Link
2018	Adadi & Berrada	Do not solely concentrate on NNs but instead cover existing black-box ML models.	Link
2018	Ching et al.	Blends the application of DL methods in biomodeical problem, understanding its potential to transform several areas of medicine and biology and discuss the essence of interpretability.	Link
2018	Zhang et al.	Mainly on the visual interpretability of DL and evaluation metrics for network interpretability.	Link
2018	Z.C. Lipton	Discuss the myths and interpretability concept in ML.	Link
2018	Guidotti et al.	Cover existing black-box ML models instead of focusing on NNs.	Link
2017	Chakraborty et al.	Structured to offer in-depth perspectives on varying degrees of interpretability, but with only 49 references for support.	Link
2017	Doshi-Velez & Kim	A framework for evaluating and comparing different interpretability methods based on their utility, fidelity, and safety.	Link
2017	Lundberg & Lee	A unified framework for interpreting the predictions of any model, including both global and local feature importance measures.	Link
2017	Wachter et al.	A method for generating counterfactual explanations for automated decisions without revealing the internal workings of the black box model.	Link
2016	Ribeiro et al.	Proposes a model-agnostic method for explaining the predictions of any classifier using local interpretable model-agnostic explanations (LIME).	Link
2014	Zeiler & Fergus	Visualizing the learned features of convolutional neural networks (CNNs) to gain insights into their internal representations.	Link

IDL for All

A curated selection of key survey articles on interpretable or explainable AI (XAI) from across several AI-related fields published in the previous five years. These publications include in-depth surveys and analyses of contemporary approaches in their respective fields.

Note: The list will be periodically refreshed and expanded to include relevant material across a wide range of subject areas.

Field	Publication	Author
Computer Vision	A Survey on Neural Network Interpretability	Zhang et al., 2020
Deep Learning	Visual interpretability for deep learning: a survey	Q. Zhang & S. Zhu, 2018
Finance	Applications of Explainable Artificial Intelligence in Finance - a systematic review of Finance, Information Systems, and Computer Science literature	Weber et al., 2023
General XAI	Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI	Arrieta et al., 2020
Graph Neural Network	Explainability in Graph Neural Networks: A Taxonomic Survey	Yuan et al., 2022
Healthcare	Causability and explainability of artificial intelligence in medicine	Holzinger et al., 2019
Materials Science and Chemistry	Interpretable and Explainable Machine Learning for Materials Science and Chemistry	Oviedo et al., 2022
Medical Image Analysis	Explainable artificial intelligence (XAI) in deep learning-based medical image analysis	Velden et al., 2022
Natural Language Processing	A Survey of the State of Explainable AI for Natural Language Processing	Danilevsky et al., 2020
Reinforcement Learning	Explainable reinforcement learning for broad-XAI: a conceptual framework and survey	Dazeley et al., 2023

These survey articles cover a wide range of AI fields and offer important insights into the development and implementation of interpretable and explainable AI techniques. Reading these articles will provide you with a solid grasp of XAI's current state and potential future possibilities.

Conference & Workshops

Welcome to the world of academia, where the pursuit of knowledge and innovation never sleeps! With so much happening in the research community, it can be hard to keep track of all the exciting events, conferences, and workshops taking place around the globe. That's why we've put together a comprehensive list of the latest and greatest talks, workshops, and conferences that you won't want to miss. From the cutting-edge fields of AI and machine learning to the latest developments in neuroscience and biotechnology, we've got it all covered. And with links to upcoming call for papers, you'll have all the information you need to stay up-to-date and get involved in the latest research. So why wait? Dive into the world of academia with us and discover the latest and greatest in research and innovation!

Note: We will shortly be updating and expanding this effort to centralize relevant topics and publications.

Workshops

Interpretability Workshop

Trustworthy and Reliable Large-Scale Machine Learning Models

ICLR 2023

May 04, 2022

Call for Papers

Pitfalls of limited data and computation for Trustworthy ML

ICLR 2023

May 05, 2022

Call for Papers

Interpretable Machine Learning in Healthcare

ICML 2022

July 23, 2022

Call for Papers

Interpretable Machine Learning in Healthcare

ICML 2022

July 23, 2022

Call for Papers

Fairness Workshop

Understanding Deep Learning Through Empirical Falsification

NeurIPS 2022

Dec 03, 2022

Accepted Papers

Algorithmic Fairness through the Lens of Causality and Privacy

NeurIPS 2022

Dec 03, 2022

Accepted Papers

Talks

Godfather of artificial intelligence talks impact and potential of AI

Geoffrey Hinton

March 2023

YouTube Link

A.I. is B.S.

Adam Conover

March 2023

YouTube Link

Building more robust machine learning models (MLNLP)

Jindong Wang

Sept 2022

Talk Link

Adversarial Machine Learning (ICLR 2019)

Ian Goodfellow

May 2019

YouTube Link

Reading Modules

We refers to the ability to understand and explain the behavior and decisions of a deep learning model. This is becoming increasingly important as DL models are being used in critical applications such as healthcare and finance. There are several modules available for quick reading that cover various aspects of interpretability in deep learning, including techniques for model debugging, feature visualization, and model explanation.