top of page

Related Resources

Welcome to our project page dedicated to curating all relevant resources related to interpretability in deep learning and explainable AI. In today's world, AI systems and deep learning models are being used for a wide range of applications, ranging from autonomous driving and medical diagnosis to natural language processing and fraud detection. While these models have shown impressive performance in these applications, their black-box nature often raises concerns about their interpretability and transparency.

Our project aims to bridge this gap by curating all relevant resources related to interpretability in deep learning and explainable AI. This curated collection includes research papers, tutorials, code libraries, and datasets that are regularly updated to reflect the latest developments in the field. Our project page provides a one-stop-shop for researchers, developers, and practitioners who are interested in learning about or improving the interpretability and explainability of AI models.

Our team of experts is committed to maintaining the highest standards of quality in curating these resources. We carefully review each resource before including it in our collection to ensure that it is relevant, informative, and useful to our users. We also encourage our users to suggest new resources that they believe could be valuable additions to our collection.

In summary, our project page is a comprehensive and regularly updated resource hub for interpretability in deep learning and explainable AI. We aim to help the AI community build more transparent, interpretable, and trustworthy AI models that can be used for a wide range of applications.

by Ayush Somani, Alexander Horsch, Dilip K. Prasad

  • Presents full coverage of interpretability in deep learning.

  • Explains the fundamental concepts of interpretability and the state of the art on the topic.

  • Includes fuzzy deep learning architectures.​

This book is a comprehensive curation, exposition and illustrative discussion of recent research tools for interpretability of deep learning models, with a focus on neural network architectures. In addition, it includes several case studies from applicationoriented articles in the fields of computer vision, optics and machine learning related topic. The book can be used as a monograph on interpretability in deep learning covering the most recent topics as well as a textbook for graduate students. Scientists with research, development and application responsibilities benefit from its systematic exposition.



Github Repositories

These are some toolkits and code base currated from GitHub that shall provide a brief summary to help visitors understand the context and background of iDL advancement in large. Feel free to explore and suggest new projects.

An open-source package that incorporates state-of-the-art machine learning interpretability techniques under one roof
A model interpretability library containing general purpose implementations of integrated gradients, saliency maps, smoothgrad, vargrad and others for PyTorch models.
Analyzing the training processes of deep generative models.
A framework agnostic, scalable Python package for DL on graphs.
Perturbation and gradient-based attribution methods for Deep Neural Net- works interpretability
Progressive Visual Analytics for Designing DNNs.
A library for debugging/inspecting machine learning classifiers and explaining their predictions.
Using Interactive Visual Experimentation to Understand Complex Deep Generative Models.
Understanding the Adversarial Game Through Visual Analytics.
A collection of infrastructure and tools for research in neural network interpretability.
Framework-agnostic implementation for state-of-the-art saliency methods
Self-attention visualization tool for different families of vision transformers.
Method for person re-identification with matching directly in deep feature maps, making it more adaptable for handling various unseen scenarios.
SHapley Additive exPlanations
Closed-form factorization of latent space in GANs
Python Library for Model Interpretation/Explanations
Visual analysis and diagnostic tools to facilitate machine learning model selection.
A toolbox to iNNvestigate neural networks’ predictions



A summary of popular interpretability methods used for understanding Deep Learning models in recent years. With the increasing adoption of AI in various fields, it is crucial to understand how these models make decisions and the factors that influence their output.


Some popular methods include LIME (Local Interpretable Model-Agnostic Explanations), SHAP (SHapley Additive exPlanations), and Grad-CAM (Gradient-weighted Class Activation Mapping). These methods help to identify important features and patterns that contribute to the model's predictions.

However, it's important to note that interpretability methods are not a one-size-fits-all solution and should be chosen based on the specific context and use case. Nonetheless, these methods provide valuable insights into the decision-making process of DL models and can help build trust and accountability.

Have you used any of these interpretability methods?



Decomposition Saliency

CAM (Class Activation Map) (Zhou et al. 2018)

Grad-CAM (Selvaraju et al. 2017)

Guided Grad-CAM and feature occlusion (Tang et al. 2019)

Score-weighted class activation mapping (Wang et al. 2019)

Smoothgrad (Smilkov et al. 2017)

Multi-layer CAM (Bahdanau et al. 2014)

LRP (Bach et al. 2015; Samek et al. 2016)

LRP on CNN and on BoW (bag of words)/SVM (Arras et al. 2017)

BiLRP (Eberle et al. 2020)

SDeepLIFT: learning important features (Shrikumar et al. 2017)

Slot activation vectors (Jacovi et al. 2018)

PRM (Peak Response Mapping) (Zhou et al. 2018)

Sensitivity Saliency

Saliency maps (Simonyan et al. 2014)

LIME (Local Interpretable Model-agnostic Explanations) (Ribeiro et al. 2016)

Guideline based Additive eXplanation optimizes complexity (Zhu and Ogino 2019)

Other Saliency

Attention map with autofocus convolutional layer (Qin et al. 2018)

Signal Inversion

Deconvolutional network (Noh et al. 2015; Zeiler and Fergus 2014)

Inverted image representations (Mahendran and Vedaldi 2015)

Inversion using CNN (Dosovitskiy et al. 2015)

Guided backpropagation (Springenberg et al. 2014; Izadyyazdanabadi et al. 2018)

Signal Optimization

Activation maximization (Olah et al. 2017)

Semantic dictionary (Olah et al. 2018)

Other Signals

Network dissection (Bau et al. 2017; Zhou et al. 2018)

Verbal Interpretability

Decision trees (Kotsiantis 2013)

Propositional logic, rule-based (Caruana et al. 2015)

Sparse decision list (Letham et al. 2015)

Rationalizing neural predictions (Lei et al. 2016)

MUSE (Model Understanding through Subspace Explanations)

(Lakkaraju et al. 2019)



Over the past few years, there has been a significant interest in interpretability in deep learning. Several survey papers have been published to summarize the state-of-the-art techniques and challenges in this field. For instance, a brief summary of some of these survey papers.

  • "Interpretability in Deep Learning: A Survey" by Liu et al. (2019)

This paper provides a comprehensive overview of interpretability methods in deep learning, including model-agnostic and model-specific approaches. It discusses the importance of interpretability and its applications in various fields such as healthcare, finance, and security. The paper also highlights the challenges and future directions in this field.

  • "A Survey on Deep Learning Interpretability: Techniques, Evaluation Criteria, and Challenges" by Doshi-Velez and Kim (2017) 

This survey paper focuses on the challenges and evaluation criteria for deep learning interpretability methods. It discusses various techniques such as attribution methods, visualization, and post-hoc explanations. The paper also highlights the importance of benchmark datasets and evaluation metrics for interpretability methods.

  • "Interpretable Machine Learning: A Brief History, State-of-the-Art and Challenges" by Guidotti et al. (2019)

 This paper provides a historical overview of interpretability in machine learning and its evolution to deep learning. It discusses various techniques such as decision trees, rule-based systems, and neural networks. The paper also highlights the challenges and future directions in interpretability for deep learning.

  • "Towards A Rigorous Science of Interpretable Machine Learning" by Lipton (2018)

 This paper focuses on the need for a rigorous science of interpretable machine learning. It discusses the challenges in evaluating interpretability methods and proposes a framework for developing and evaluating these methods. The paper also highlights the importance of understanding the trade-off between interpretability and performance.


In conclusion, iDL is an important field that has gained significant attention in recent years. Survey papers such as those mentioned above provide a valuable overview of the state-of-the-art techniques and challenges in this field, and highlight the importance of interpretability in various applications. Here, we provide a summary table of some interpretability survey papers in recent years

Paper Link
Samek et al.
An extensive timely review on active emerging field and placing interpretability algorithms to test theoritically and with extensive simulations.
Mishra et al.
Survey of the works that analysed the robustness of two classes of local explanations (feature importance and counterfactual explanations) and discusses some interesting results.
Mohseni et al.
Categorization presents the mapping between design goals for different XAI user groups and their evaluation methods.Further, provide summarized ready-to-use tables of evaluation methods and recommendations for different goals in XAI research.
Vilone & Longo
Clustering all the scientific studies via a hierarchical system that classifies theories and notions related to the concept of explainability and the evaluation approaches for XAI methods and concludes by critically discussing these gaps and limitations.
Zhang et al.
Interpretability taxonomy for NNs in three dimensions comprising passive and active approach, type of explanation, and local vs global interpretability.
Fan et al.
Classifies NNs by their interpretability, discusses medical uses, and draws connections to fuzzy logic and neuroscience.
Arrieta et al.
Detailed explanations of key ideas and taxonomies are provided, and existing barriers to the development of explainable artificial intelligence (XAI) are highlighted.
Tjoa et al.
Collection of journal articles under perspective and mathematical interpretability and applying the same categorization towards medical research.
Vilone & Longo
Extensive clusters the XAI concept into different taxonomies, theories and evaluation approach. A collection of 361 papers and elaborative tables of classifying the explainability in AI.
Huang et al.
Discusses extensively on verification, testing, adversarial attacks on DL and interpretability techniques on 202 papers most published after 2017.
C. Molnar
A comprehensive overview of the different techniques and tools for interpreting machine learning models, including model-agnostic and model-specific methods.
Du et al.
It addresses 40 studies, which are broken down into categories such as global and local explanations, and coarse-grained post-hoc and ad-hoc explanations.
Holzinger et al.
Discusses the need for causability, a property of a person, in addition to explainability, a property of a system, to achieve a truly explainable medicine.
Mittelstadt et al.
Provides a short summary on explaining AI from the perspective of philosophy, sociology and human-computer interactions.
Verma et al.
A comprehensive overview of the different fairness definitions used in machine learning, including group fairness, individual fairness, and counterfactual fairness.
Giplin et al.
Classified the understanding the workflow and representation of NNs.
Melis et al.
Short survey of incapabilities of popular methods and direction towards self-explaining models based on explicitness, faithfullness, and stability.
Dhurandhar et al.
An information-theoretic framework for learning to explain the predictions of black box models.
Adadi & Berrada
Do not solely concentrate on NNs but instead cover existing black-box ML models.
Ching et al.
Blends the application of DL methods in biomodeical problem, understanding its potential to transform several areas of medicine and biology and discuss the essence of interpretability.
Zhang et al.
Mainly on the visual interpretability of DL and evaluation metrics for network interpretability.
Z.C. Lipton
Discuss the myths and interpretability concept in ML.
Guidotti et al.
Cover existing black-box ML models instead of focusing on NNs.
Chakraborty et al.
Structured to offer in-depth perspectives on varying degrees of interpretability, but with only 49 references for support.
Doshi-Velez & Kim
A framework for evaluating and comparing different interpretability methods based on their utility, fidelity, and safety.
Lundberg & Lee
A unified framework for interpreting the predictions of any model, including both global and local feature importance measures.
Wachter et al.
A method for generating counterfactual explanations for automated decisions without revealing the internal workings of the black box model.
Ribeiro et al.
Proposes a model-agnostic method for explaining the predictions of any classifier using local interpretable model-agnostic explanations (LIME).
Zeiler & Fergus
Visualizing the learned features of convolutional neural networks (CNNs) to gain insights into their internal representations.

IDL for All

A curated selection of key survey articles on interpretable or explainable AI (XAI) from across several AI-related fields published in the previous five years. These publications include in-depth surveys and analyses of contemporary approaches in their respective fields.

Note: The list will be periodically refreshed and expanded to include relevant material across a wide range of subject areas.

These survey articles cover a wide range of AI fields and offer important insights into the development and implementation of interpretable and explainable AI techniques. Reading these articles will provide you with a solid grasp of XAI's current state and potential future possibilities.


Conference & Workshops

Welcome to the world of academia, where the pursuit of knowledge and innovation never sleeps! With so much happening in the research community, it can be hard to keep track of all the exciting events, conferences, and workshops taking place around the globe. That's why we've put together a comprehensive list of the latest and greatest talks, workshops, and conferences that you won't want to miss. From the cutting-edge fields of AI and machine learning to the latest developments in neuroscience and biotechnology, we've got it all covered. And with links to upcoming call for papers, you'll have all the information you need to stay up-to-date and get involved in the latest research. So why wait? Dive into the world of academia with us and discover the latest and greatest in research and innovation!

Note: We will shortly be updating and expanding this effort to centralize relevant topics and publications.



Godfather of artificial intelligence talks impact and potential of AI

Geoffrey Hinton

March 2023

YouTube Link

A.I. is B.S.

Adam Conover

March 2023

YouTube Link

Building more robust machine learning models (MLNLP)

Jindong Wang

Sept 2022

Talk Link

Adversarial Machine Learning (ICLR 2019)

Ian Goodfellow

May 2019

YouTube Link


Reading Modules

We refers to the ability to understand and explain the behavior and decisions of a deep learning model. This is becoming increasingly important as DL models are being used in critical applications such as healthcare and finance. There are several modules available for quick reading that cover various aspects of interpretability in deep learning, including techniques for model debugging, feature visualization, and model explanation.

bottom of page