Explaining Machine Learning Models through Counterfactuals

New Methods Seminar — Bank of England

Patrick Altmeyer

Blurb

Counterfactual Explanations explain how inputs into a model need to change for it to produce different outputs. Explanations that involve realistic and actionable changes can be used for the purpose of Algorithmic Recourse: they offer human stakeholders a principled approach to not only understand the model they are seeking to explain, but also react to it or adjust it.

The general setup lends itself naturally to Bank datasets that revolve around counterparty risk, for example. In this seminar I will introduce the topic and place it into the broader context of Explainable AI. Using my Julia package I will go through a worked example involving a publicly available credit data set. Finally, I will also briefly present some of our recent research that points to potential pitfalls of current state-of-the-art approaches and proposes mitigation strategies.

DISCLAIMER: Views presented in this presentation are my own.

Quick Intro

  • Currently 2nd year of PhD in Trustworthy Artificial Intelligence at Delft University of Technology.
  • Working on Counterfactual Explanations and Probabilistic Machine Learning with applications in Finance.
  • Previously, educational background in Economics and Finance and two years at the Bank of England (MPAT \(\subset\) MIAD).
  • Enthusiastic about free open-source software, in particular Julia and Quarto.

Trustworthy AI 🔮

The Problem with Today’s AI

From human to data-driven decision-making …

  • Black-box models like deep neural networks are being deployed virtually everywhere.
  • Includes safety-critical and public domains: health care, autonomous driving, finance, …
  • More likely than not that your loan or employment application is handled by an algorithm.

… where black boxes are recipe for disaster.

  • We have no idea what exactly we’re cooking up …
    • Have you received an automated rejection email? Why didn’t you “mEet tHe sHoRtLisTiNg cRiTeRia”? 🙃
  • … but we do know that some of it is junk.
Figure 1: Adversarial attacks on deep neural networks. Source: Goodfellow, Shlens, and Szegedy (2014)

Towards Trustworthy AI

Ground Truthing

Probabilistic Models

Counterfactual Reasoning

Towards Trustworthy AI

Ground Truthing

Probabilistic Models

Counterfactual Reasoning

Current Standard in ML

We typically want to maximize the likelihood of observing \(\mathcal{D}_n\) under given parameters (Murphy 2022):

\[ \theta^* = \arg \max_{\theta} p(\mathcal{D}_n|\theta) \qquad(1)\]

Compute an MLE (or MAP) point estimate \(\hat\theta = \mathbb{E} \theta^*\) and use plugin approximation for prediction:

\[ p(y|x,\mathcal{D}_n) \approx p(y|x,\hat\theta) \qquad(2)\]

  • In an ideal world we can just use parsimonious and interpretable models like GLM (Rudin 2019), for which in many cases we can rely on asymptotic properties of \(\theta\) to quantify uncertainty.
  • In practice these models often have performance limitations.
  • Black-box models like deep neural networks are popular, but they are also the very opposite of parsimonious.

Objective

Towards Trustworthy AI

Ground Truthing

Probabilistic Models

Counterfactual Reasoning

Objective

. . .

[…] deep neural networks are typically very underspecified by the available data, and […] parameters [therefore] correspond to a diverse variety of compelling explanations for the data. (Wilson 2020)

In this setting it is often crucial to treat models probabilistically!

\[ p(y|x,\mathcal{D}_n) = \int p(y|x,\theta)p(\theta|\mathcal{D}_n)d\theta \qquad(3)\]

Towards Trustworthy AI

Ground Truthing

Probabilistic Models

Counterfactual Reasoning

We can now make predictions – great! But do we know how the predictions are actually being made?

Objective

With the model trained for its task, we are interested in understanding how its predictions change in response to input changes.

\[ \nabla_x p(y|x,\mathcal{D}_n;\hat\theta) \qquad(4)\]

  • Counterfactual reasoning (in this context) boils down to simple questions: what if \(x\) (factual) \(\Rightarrow\) \(x\prime\) (counterfactual)?
  • By strategically perturbing features and checking the model output, we can (begin to) understand how the model makes its decisions.
  • Counterfactual Explanations always have full fidelity by construction (as opposed to surrogate explanations, for example).

. . .

Important to realize that we are keeping \(\hat\theta\) constant!

Today’s talk

  1. 🔮 Explaining Black-Box Models through Counterfactuals (\(\approx\) 10min)
    • What are they? What are they not?
    • Counterfactual Explanations in the broader XAI landscape
    • From Counterfactual Explanations to Algorithmic Recourse
  2. 🛠️ Hands-on examples — CounterfactualExplanations.jl in Julia (\(\approx\) 15min)
  3. 📊 Endogenous Macrodynamics in Algorithmic Recourse (\(\approx\) 10min)
  4. ❓ Q&A (\(\approx\) 10min)
  5. 🚀 Related Research Topics (\(\approx\) 10min)
    • Predictive Uncertainty Quantification

Explaining Black-Box Models through Counterfactuals 🔮

A Framework for Counterfactual Explanations

Even though […] interpretability is of great importance and should be pursued, explanations can, in principle, be offered without opening the “black box”. (Wachter, Mittelstadt, and Russell 2017)

Framework

. . .

Objective originally proposed by Wachter, Mittelstadt, and Russell (2017) is as follows

\[ \min_{x\prime \in \mathcal{X}} h(x\prime) \ \ \ \mbox{s. t.} \ \ \ M(x\prime) = t \qquad(5)\]

where \(h\) relates to the complexity of the counterfactual and \(M\) denotes the classifier.

. . .

Typically this is approximated through regularization:

\[ x\prime = \arg \min_{x\prime} \ell(M(x\prime),t) + \lambda h(x\prime) \qquad(6)\]

Intuition

. . .

Figure 2: A cat performing gradient descent in the feature space à la Wachter, Mittelstadt, and Russell (2017).

Counterfactuals … as in Adversarial Examples?

Yes and no!

While both are methodologically very similar, adversarial examples are meant to go undetected while CEs ought to be meaningful.

Desiderata

  • closeness: the average distance between factual and counterfactual features should be small (Wachter, Mittelstadt, and Russell (2017))
  • actionability: the proposed feature perturbation should actually be actionable (Ustun, Spangher, and Liu (2019), Poyiadzi et al. (2020))
  • plausibility: the counterfactual explanation should be plausible to a human (Joshi et al. (2019))
  • unambiguity: a human should have no trouble assigning a label to the counterfactual (Schut et al. (2021))
  • sparsity: the counterfactual explanation should involve as few individual feature changes as possible (Schut et al. (2021))
  • robustness: the counterfactual explanation should be robust to domain and model shifts (Upadhyay, Joshi, and Lakkaraju (2021))
  • diversity: ideally multiple diverse counterfactual explanations should be provided (Mothilal, Sharma, and Tan (2020))
  • causality: counterfactual explanations reflect the structural causal model underlying the data generating process (Karimi et al. (2020), Karimi, Schölkopf, and Valera (2021))

Counterfactuals … as in Causal Inference?

NO!

Causal inference: counterfactuals are thought of as unobserved states of the world that we would like to observe in order to establish causality.

  • The only way to do this is by actually interfering with the state of the world: \(p(y|do(x),\theta)\).
  • In practice we can only move some individuals to the counterfactual state of the world and compare their outcomes to a control group.
  • Provided we have controlled for confounders, properly randomized, … we can estimate an average treatment effect: \(\hat\theta\).

Counterfactual Explanations: involves perturbing features after some model has been trained.

  • We end up comparing modeled outcomes \(p(y|x,\hat\phi)\) and \(p(y|x\prime,\hat\phi)\) for individuals.
  • We have not magically solved causality.

The XAI Landscape

A (highly) simplified and incomplete overview …

Figure 3: A (highly) simplified and incomplete overview of the XAI landscape loosly based on (molnar2020interpretable?).

Surrogate Explainers

  • Lundberg and Lee (2017) propose SHAP as a provably unified approach to additive feature attribution methods (including LIME) with certain desiderata. Contrary to LIME, this approach involves permuting through the feature space and checking how different features impact model predictions when they are included in the permutations.
  • Ribeiro, Singh, and Guestrin (2016) propose Local Interpretable Model-Agnostic Explanations (LIME): the approach involves generating local perturbations in the input space, deriving predictions from the original classifier and than fitting a white box model (e.g. linear regression) on this synthetic data set.

Counterfactual Explanations

  • Wachter, Mittelstadt, and Russell (2017) were among the first to propose counterfactual explanations that do not require knowledge about the inner workings of a black-box model.
  • Joshi et al. (2019) extend the framework of Ustun, Spangher, and Liu (2019). Their proposed REVISE method is applicable to a broader class of models including black box classifiers and structural causal models. For a summary see here and for a set of slides see here.
  • Schut et al. (2021) introduce Bayesian modeling to the context of CE: their approach implicitly minimizes aleatoric and epistemic uncertainty to generate a CE that us unambiguous and realistic, respectively.

Criticism (XAI)

“Explanatory models by definition do not produce 100% reliable explanations, because they are approximations. This means explanations can’t be fully trusted, and so neither can the original model.” – causaLens, 2021

  • Mittelstadt, Russell, and Wachter (2019) points out that there is a gap in the understanding of what explanations are between computer scientists and explanation scientists (social scientists, cognitive scientists, pyschologists, …). Current methods produce at best locally reliable explanations. There needs to be shift towards interactive explanations.
  • Rudin (2019) argues that instead of bothering with explanations for black box models we should focus on designing inherently interpretable models. In her view the trade-off between (intrinsic) explainability and performance is not as clear-cut as people claim.
  • Lakkaraju and Bastani (2020) show how misleading black box explanations can manipulate users into trusting an untrustworthy model.
  • Slack et al. (2020) demonstrate that both LIME and SHAP are not reliable: their reliance on feature perturbations makes them susceptible to adversarial attacks.
  • Slack et al. (2021) show that (gradient-based) Counterfactual Explanations that are also vulnerable to manipulation, but various simple mitigation strategies can be used to avoid this.

From Counterfactual Explanations to Algorithmic Recourse

“You cannot appeal to (algorithms). They do not listen. Nor do they bend.”

— Cathy O’Neil in Weapons of Math Destruction, 2016

Figure 4: Cathy O’Neil. Source: Cathy O’Neil a.k.a. mathbabe.

Algorithmic Recourse

. . .

  • O’Neil (2016) points to various real-world involving black-box models and affected individuals facing adverse outcomes.

. . .

  • These individuals generally have no way to challenge their outcome.

. . .

Counterfactual Explanations that involve actionable and realistic feature perturbations can be used for the purpose of Algorithmic Recourse.

CounterfactualExplanations.jl in Julia 🛠️

CounterfactualExplanations.jl 📦

Stable Dev Build Status codecov codecov 89% 89% Code Style: Blue ColPrac: Contributor’s Guide on Collaborative Practices for Community Packages

  • A unifying framework for generating Counterfactual Explanations.
  • Fast, extensible and composable allowing users and developers to add and combine different counterfactual generators.
  • Implements a number of SOTA generators.
  • Built in Julia, but can be used to explain models built in R and Python (still experimental).
  • Status 🔁: ready for research, not production. Thought/challenge/contributions welcome!

Photo by Denise Jans on Unsplash.

Julia has an edge with respect to Trustworthy AI: it’s open-source, uniquely transparent and interoperable 🔴🟢🟣

A simple example

  1. Load and prepare some toy data.
  2. Select a random sample.
  3. Generate counterfactuals using different approaches.

Generic Generator

Code

. . .

We begin by instantiating the fitted model …

. . .

… then based on its prediction for \(x\) we choose the opposite label as our target …

. . .

… and finally generate the counterfactual.

Output

. . .

… et voilà!

Probabilistic Methods for Counterfactual Explanations

When people say that counterfactuals should look realistic or plausible, they really mean that counterfactuals should be generated by the same Data Generating Process (DGP) as the factuals:

\[ x\prime \sim p(x) \]

But how do we estimate \(p(x)\)? Two probabilistic approaches …

Schut et al. (2021) note that by maximizing predictive probabilities \(\sigma(M(x\prime))\) for probabilistic models \(M\in\mathcal{\widetilde{M}}\) one implicitly minimizes epistemic and aleotoric uncertainty.

\[ x\prime = \arg \min_{x\prime} \ell(M(x\prime),t) \ \ \ , \ \ \ M\in\mathcal{\widetilde{M}} \qquad(7)\]

Figure 5: A cat performing gradient descent in the feature space à la Schut et al. (2021)

Instead of perturbing samples directly, some have proposed to instead traverse a lower-dimensional latent embedding learned through a generative model (Joshi et al. 2019).

\[ z\prime = \arg \min_{z\prime} \ell(M(dec(z\prime)),t) + \lambda h(x\prime) \qquad(8)\]

and

\[x\prime = dec(z\prime)\]

where \(dec(\cdot)\) is the decoder function.

Figure 6: Counterfactual (yellow) generated through latent space search (right panel) following Joshi et al. (2019). The corresponding counterfactual path in the feature space is shown in the left panel.

Greedy Generator

Code

. . .

This time we use a Bayesian classifier …

. . .

… and once again choose our target label as before …

. . .

… to then finally use greedy search to find a counterfactual.

Output

. . .

In this case the Bayesian approach yields a similar outcome.

Latent Space Generator

Code

. . .

Using the same classifier as before we can either use the specific REVISEGenerator

. . .

… or realize that that REVISE (Joshi et al. 2019) just boils down to generic search in a latent space:

Output

. . .

We have essentially combined latent search with a probabilistic classifier (as in Antorán et al. (2020)).

Diverse Counterfactuals

Code

. . .

We can use the DiCEGenerator to produce multiple diverse counterfactuals:

Output

. . .

A Real-World Example - Credit Default

  • The Give Me Some Credit dataset is publicly available from Kaggle.

Improve on the state of the art in credit scoring by predicting the probability that somebody will experience financial distress in the next two years.

  • We have \(y \in \{0=\text{no stress},1=\text{stress}\}\) and a number of demographic and credit-related features \(X\).

Ignoring Mutability

Using DiCE to generate counterfactuals for a single individual, ignoring actionability:

Respecting Mutability

Using the generic generator to generate counterfactuals for multiple individuals, respecting that age cannot be decreased (you might argue that age also cannot be easily increased …):

Endogenous Macrodynamics in Algorithmic Recourse 📊

Motivation

TL;DR: We find that standard implementation of various SOTA approaches to AR can induce substantial domain and model shifts. We argue that these dynamics indicate that individual recourse generates hidden external costs and provide mitigation strategies.

In this work we investigate what happens if Algorithmic Recourse is actually implemented by a large number of individuals.

Figure 7 illustrates what we mean by Endogenous Macrodynamics in Algorithmic Recourse:

  • Panel (a): we have a simple linear classifier trained for binary classification where samples from the negative class (y=0) are marked in blue and samples of the positive class (y=1) are marked in orange
  • Panel (b): the implementation of AR for a random subset of individuals leads to a noticable domain shift
  • Panel (c): as the classifier is retrained we observe a corresponding model shift (Upadhyay, Joshi, and Lakkaraju 2021)
  • Panel (d): as this process is repeated, the decision boundary moves away from the target class.
Figure 7: Proof of concept: repeated implementation of AR leads to domain and model shifts.

We argue that these shifts should be considered as an expected external cost of individual recourse and call for a paradigm shift from individual to collective recourse in these types of situations.

Generalised Framework

From individual recourse …

We restate Equation 6 to encapsulate latent space search:

\[ \begin{aligned} \mathbf{s}^\prime &= \arg \min_{\mathbf{s}^\prime \in \mathcal{S}} \left\{ {\text{yloss}(M(f(\mathbf{s}^\prime)),y^*)}+ \lambda {\text{cost}(f(\mathbf{s}^\prime)) } \right\} \end{aligned} \qquad(9)\]

… towards collective recourse

We borrow the notion of negative externalities from Economics, to formalise the idea that individual recourse fails to account for external costs:

\[ \begin{aligned} \mathbf{s}^\prime &= \arg \min_{\mathbf{s}^\prime \in \mathcal{S}} \{ {\text{yloss}(M(f(\mathbf{s}^\prime)),y^*)} \\ &+ \lambda_1 {\text{cost}(f(\mathbf{s}^\prime))} + \lambda_2 {\text{extcost}(f(\mathbf{s}^\prime))} \} \end{aligned} \qquad(10)\]

Findings

Results for synthetic data.

Results for real-word data.

Mitigation Strategies

  1. Choose more conservative decision thresholds.
  2. Classifer Preserving ROAR (ClaPROAR): penalise classifier loss.

\[ \begin{aligned} \text{extcost}(f(\mathbf{s}^\prime)) = l(M(f(\mathbf{s}^\prime)),y^\prime) \end{aligned} \qquad(11)\]

  1. Gravitational Counterfactual Explanations: penalise distance to some sensible point in the target domain.

\[ \begin{aligned} \text{extcost}(f(\mathbf{s}^\prime)) = \text{dist}(f(\mathbf{s}^\prime),\bar{x}) \end{aligned} \qquad(12)\]

Figure 8: Illustrative example demonstrating the properties of the various mitigation strategies. Samples from the negative class \((y = 0)\) are marked in blue while samples of the positive class \((y = 1)\) are marked in orange.

Mitigation strategies applied to synthetic data.

Mitigation strategies applied to real-world data.

Questions & Answers ❓

Effortless Bayesian Deep Learning through Laplace Redux

Stable Dev Build Status codecov codecov 95% 95%

LaplaceRedux.jl (formerly BayesLaplace.jl) is a small package that can be used for effortless Bayesian Deep Learning and Logistic Regression trough Laplace Approximation. It is inspired by this Python library and its companion paper.

Plugin Approximation (left) and Laplace Posterior (right) for simple artificial neural network.

Simulation of changing posteriour predictive distribution. Image by author.

ConformalPrediction.jl

Stable Dev Build Status codecov codecov 73% 73% Code Style: Blue ColPrac: Contributor’s Guide on Collaborative Practices for Community Packages Twitter Badge

ConformalPrediction.jl is a package for Uncertainty Quantification (UQ) through Conformal Prediction (CP) in Julia. It is designed to work with supervised models trained in MLJ (Blaom et al. 2020). Conformal Prediction is distribution-free, easy-to-understand, easy-to-use and model-agnostic.

Conformal Prediction in action: Prediction sets for two different samples and changing coverage rates. As coverage grows, so does the size of the prediction sets.

More Resources 📚

Read on …

  • Blog post introducing CE: [TDS], [blog].
  • Blog post on Laplace Redux: [TDS], [blog].
  • Blog post on Conformal Prediction: [TDS], [blog].

… or get involved! 🤗

Image Sources

  • Crystal ball on beach: Nicole Avagliano on Unsplash
  • Colour gradient: A.Z on Unsplash
  • Elephant herd: Sergi Ferrete on Unsplash
  • Bank of England logo: Bank of England here

References

Antorán, Javier, Umang Bhatt, Tameem Adel, Adrian Weller, and José Miguel Hernández-Lobato. 2020. “Getting a Clue: A Method for Explaining Uncertainty Estimates.” https://arxiv.org/abs/2006.06848.
Blaom, Anthony D., Franz Kiraly, Thibaut Lienart, Yiannis Simillides, Diego Arenas, and Sebastian J. Vollmer. 2020. MLJ: A Julia Package for Composable Machine Learning.” Journal of Open Source Software 5 (55): 2704. https://doi.org/10.21105/joss.02704.
Goodfellow, Ian J, Jonathon Shlens, and Christian Szegedy. 2014. “Explaining and Harnessing Adversarial Examples.” https://arxiv.org/abs/1412.6572.
Joshi, Shalmali, Oluwasanmi Koyejo, Warut Vijitbenjaronk, Been Kim, and Joydeep Ghosh. 2019. “Towards Realistic Individual Recourse and Actionable Explanations in Black-Box Decision Making Systems.” https://arxiv.org/abs/1907.09615.
Karimi, Amir-Hossein, Bernhard Schölkopf, and Isabel Valera. 2021. “Algorithmic Recourse: From Counterfactual Explanations to Interventions.” In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 353–62.
Karimi, Amir-Hossein, Julius Von Kügelgen, Bernhard Schölkopf, and Isabel Valera. 2020. “Algorithmic Recourse Under Imperfect Causal Knowledge: A Probabilistic Approach.” https://arxiv.org/abs/2006.06831.
Lakkaraju, Himabindu, and Osbert Bastani. 2020. “" How Do I Fool You?" Manipulating User Trust via Misleading Black Box Explanations.” In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 79–85.
Lundberg, Scott M, and Su-In Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” In Proceedings of the 31st International Conference on Neural Information Processing Systems, 4768–77.
Mittelstadt, Brent, Chris Russell, and Sandra Wachter. 2019. “Explaining Explanations in AI.” In Proceedings of the Conference on Fairness, Accountability, and Transparency, 279–88. https://doi.org/10.1145/3287560.3287574.
Mothilal, Ramaravind K, Amit Sharma, and Chenhao Tan. 2020. “Explaining Machine Learning Classifiers Through Diverse Counterfactual Explanations.” In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 607–17. https://doi.org/10.1145/3351095.3372850.
Murphy, Kevin P. 2022. Probabilistic Machine Learning: An Introduction. MIT Press.
O’Neil, Cathy. 2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown.
Poyiadzi, Rafael, Kacper Sokol, Raul Santos-Rodriguez, Tijl De Bie, and Peter Flach. 2020. FACE: Feasible and Actionable Counterfactual Explanations.” In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 344–50.
Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. “"Why Should i Trust You?" Explaining the Predictions of Any Classifier.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–44.
Rudin, Cynthia. 2019. “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.” Nature Machine Intelligence 1 (5): 206–15. https://doi.org/10.1038/s42256-019-0048-x.
Schut, Lisa, Oscar Key, Rory Mc Grath, Luca Costabello, Bogdan Sacaleanu, Yarin Gal, et al. 2021. “Generating Interpretable Counterfactual Explanations By Implicit Minimisation of Epistemic and Aleatoric Uncertainties.” In International Conference on Artificial Intelligence and Statistics, 1756–64. PMLR.
Slack, Dylan, Anna Hilgard, Himabindu Lakkaraju, and Sameer Singh. 2021. “Counterfactual Explanations Can Be Manipulated.” Advances in Neural Information Processing Systems 34.
Slack, Dylan, Sophie Hilgard, Emily Jia, Sameer Singh, and Himabindu Lakkaraju. 2020. “Fooling Lime and Shap: Adversarial Attacks on Post Hoc Explanation Methods.” In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 180–86.
Upadhyay, Sohini, Shalmali Joshi, and Himabindu Lakkaraju. 2021. “Towards Robust and Reliable Algorithmic Recourse.” Advances in Neural Information Processing Systems 34: 16926–37.
Ustun, Berk, Alexander Spangher, and Yang Liu. 2019. “Actionable Recourse in Linear Classification.” In Proceedings of the Conference on Fairness, Accountability, and Transparency, 10–19. https://doi.org/10.1145/3287560.3287566.
Wachter, Sandra, Brent Mittelstadt, and Chris Russell. 2017. “Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR.” Harv. JL & Tech. 31: 841. https://doi.org/10.2139/ssrn.3063289.
Wilson, Andrew Gordon. 2020. “The Case for Bayesian Deep Learning.” https://arxiv.org/abs/2001.10995.