Explaining Black-Box Models through Counterfactuals

A Gentle Introduction

Patrick Altmeyer

using Pkg
Pkg.activate("content/talks/posts/2022-dscc/")

House Rules

DISCLAIMER: Views presented in this presentation are my own.

Quick Intro

Currently 2nd year of PhD in Trustworthy Artificial Intelligence.
Working on Counterfactual Explanations and Probabilistic Machine Learning.
Previously, educational background in Economics and Finance and two years in monetary policy at the Bank of England.
Enthusiastic about free open-source software, in particular Julia and Quarto.

Trustworthy AI 🔮

The Problem with Today’s AI

From human to data-driven decision-making …

Black-box models like deep neural networks are being deployed virtually everywhere.
Includes safety-critical and public domains: health care, autonomous driving, finance, …
More likely than not that your loan or employment application is handled by an algorithm.

… where black boxes are recipe for disaster.

We have no idea what exactly we’re cooking up …
- Have you received an automated rejection email? Why didn’t you “mEet tHe sHoRtLisTiNg cRiTeRia”? 🙃
… but we do know that some of it is junk.

Figure 1: Adversarial attacks on deep neural networks. Source: Goodfellow, Shlens, and Szegedy (2014)

Towards Trustworthy AI

Ground Truthing

Probabilistic Models

Counterfactual Reasoning

Towards Trustworthy AI

Ground Truthing

Probabilistic Models

Counterfactual Reasoning

Current Standard in ML

We typically want to maximize the likelihood of observing \(\mathcal{D}_n\) under given parameters (Murphy 2022):

\[ \theta^* = \arg \max_{\theta} p(\mathcal{D}_n|\theta) \qquad(1)\]

Compute an MLE (or MAP) point estimate \(\hat\theta = \mathbb{E} \theta^*\) and use plugin approximation for prediction:

\[ p(y|x,\mathcal{D}_n) \approx p(y|x,\hat\theta) \qquad(2)\]

In an ideal world we can just use parsimonious and interpretable models like GLM (Rudin 2019), for which in many cases we can rely on asymptotic properties of \(\theta\) to quantify uncertainty.
In practice these models often have performance limitations.
Black-box models like deep neural networks are popular, but they are also the very opposite of parsimonious.

Objective

Towards Trustworthy AI

Ground Truthing

Probabilistic Models

Counterfactual Reasoning

Objective

. . .

[…] deep neural networks are typically very underspecified by the available data, and […] parameters [therefore] correspond to a diverse variety of compelling explanations for the data. (Wilson 2020)

In this setting it is often crucial to treat models probabilistically!

\[ p(y|x,\mathcal{D}_n) = \int p(y|x,\theta)p(\theta|\mathcal{D}_n)d\theta \qquad(3)\]

Towards Trustworthy AI

Ground Truthing

Probabilistic Models

Counterfactual Reasoning

We can now make predictions – great! But do we know how the predictions are actually being made?

Objective

With the model trained for its task, we are interested in understanding how its predictions change in response to input changes.

\[ \nabla_x p(y|x,\mathcal{D}_n;\hat\theta) \qquad(4)\]

Counterfactual reasoning (in this context) boils down to simple questions: what if \(x\) (factual) \(\Rightarrow\) \(x\prime\) (counterfactual)?
By strategically perturbing features and checking the model output, we can (begin to) understand how the model makes its decisions.
Counterfactual Explanations always have full fidelity by construction (as opposed to surrogate explanations, for example).

. . .

Important to realize that we are keeping \(\hat\theta\) constant!

Today’s talk

🔮 Explaining Black-Box Models through Counterfactuals (\(\approx\) 10min)
- What are they? What are they not?
- From Counterfactual Explanations to Algorithmic Recourse
🛠️ Hands-on examples — CounterfactualExplanations.jl in Julia (\(\approx\) 15min)
📊 Endogenous Macrodynamics in Algorithmic Recourse (\(\approx\) 10min)
🚀 The Road Ahead — Related Research Topics (\(\approx\) 10min)
- Predictive Uncertainty Quantification
❓ Q&A (\(\approx\) 10min)

Explaining Black-Box Models through Counterfactuals 🔮

A Framework for Counterfactual Explanations

Even though […] interpretability is of great importance and should be pursued, explanations can, in principle, be offered without opening the “black box”. (Wachter, Mittelstadt, and Russell 2017)

Framework

. . .

Objective originally proposed by Wachter, Mittelstadt, and Russell (2017) is as follows

\[ \min_{x\prime \in \mathcal{X}} h(x\prime) \ \ \ \mbox{s. t.} \ \ \ M(x\prime) = t \qquad(5)\]

where \(h\) relates to the complexity of the counterfactual and \(M\) denotes the classifier.

. . .

Typically this is approximated through regularization:

\[ x\prime = \arg \min_{x\prime} \ell(M(x\prime),t) + \lambda h(x\prime) \qquad(6)\]

Intuition

. . .

Figure 2: A cat performing gradient descent in the feature space à la Wachter, Mittelstadt, and Russell (2017).

Counterfactuals … as in Adversarial Examples?

Yes and no!

While both are methodologically very similar, adversarial examples are meant to go undetected while CEs ought to be meaningful.

Desiderata

closeness: the average distance between factual and counterfactual features should be small (Wachter, Mittelstadt, and Russell (2017))
actionability: the proposed feature perturbation should actually be actionable (Ustun, Spangher, and Liu (2019), Poyiadzi et al. (2020))
plausibility: the counterfactual explanation should be plausible to a human (Joshi et al. (2019))
unambiguity: a human should have no trouble assigning a label to the counterfactual (Schut et al. (2021))
sparsity: the counterfactual explanation should involve as few individual feature changes as possible (Schut et al. (2021))
robustness: the counterfactual explanation should be robust to domain and model shifts (Upadhyay, Joshi, and Lakkaraju (2021))
diversity: ideally multiple diverse counterfactual explanations should be provided (Mothilal, Sharma, and Tan (2020))
causality: counterfactual explanations reflect the structural causal model underlying the data generating process (Karimi et al. (2020), Karimi, Schölkopf, and Valera (2021))

Counterfactuals … as in Causal Inference?

NO!

Causal inference: counterfactuals are thought of as unobserved states of the world that we would like to observe in order to establish causality.

The only way to do this is by actually interfering with the state of the world: \(p(y|do(x),\theta)\).
In practice we can only move some individuals to the counterfactual state of the world and compare their outcomes to a control group.
Provided we have controlled for confounders, properly randomized, … we can estimate an average treatment effect: \(\hat\theta\).

Counterfactual Explanations: involves perturbing features after some model has been trained.

We end up comparing modeled outcomes \(p(y|x,\hat\phi)\) and \(p(y|x\prime,\hat\phi)\) for individuals.
We have not magically solved causality.

The number of ostensibly pro data scientists confusing themselves into believing that "counterfactual explanations" capture real-world causality is just staggering🤦‍♀️. Where do we go from here? How can a community that doesn't even understand what's already known make advances?
— Zachary Lipton (@zacharylipton) June 20, 2022

From Counterfactual Explanations to Algorithmic Recourse

“You cannot appeal to (algorithms). They do not listen. Nor do they bend.”

— Cathy O’Neil in Weapons of Math Destruction, 2016

Figure 3: Cathy O’Neil. Source: Cathy O’Neil a.k.a. mathbabe.

Algorithmic Recourse

. . .

O’Neil (2016) points to various real-world involving black-box models and affected individuals facing adverse outcomes.

. . .

These individuals generally have no way to challenge their outcome.

. . .

Counterfactual Explanations that involve actionable and realistic feature perturbations can be used for the purpose of Algorithmic Recourse.

`CounterfactualExplanations.jl` in Julia 🛠️

Limited Software Availability

Work currently scattered across different GitHub repositories …

Only one unifying Python library: CARLA (Pawelczyk et al. 2021).
- Comprehensive and (somewhat) extensible.
- Not composable: each generator is treated as different class/entity.
Both R and Julia lacking any kind of implementation.
Enter: 👉 CounterfactualExplanations.jl Altmeyer (2022)

Photo by Volodymyr Hryshchenko on Unsplash.

`CounterfactualExplanations.jl` 📦

A unifying framework for generating Counterfactual Explanations.
Fast, extensible and composable allowing users and developers to add and combine different counterfactual generators.
Implements a number of SOTA generators.
Built in Julia, but can be used to explain models built in R and Python (still experimental).

Julia has an edge with respect to Trustworthy AI: it’s open-source, uniquely transparent and interoperable 🔴🟢🟣

A simple example

Load and prepare some toy data.
Select a random sample.
Generate counterfactuals using different approaches.

# Data:
using Random
Random.seed!(123)
N = 100
using CounterfactualExplanations
xs, ys = toy_data_linear(N)
X = hcat(xs...)
counterfactual_data = CounterfactualData(X,ys')

# Randomly selected factual:
x = select_factual(counterfactual_data,rand(1:size(X)[2]))

Generic Generator

Code

. . .

We begin by instantiating the fitted model …

# Model
w = [1.0 1.0] # estimated coefficients
b = 0 # estimated bias
M = LogisticModel(w, [b])

. . .

… then based on its prediction for \(x\) we choose the opposite label as our target …

# Select target class:
y = round(probs(M, x)[1])
target = ifelse(y==1.0,0.0,1.0) # opposite label as target

. . .

… and finally generate the counterfactual.

# Counterfactual search:
generator = GenericGenerator()
counterfactual = generate_counterfactual(
  x, target, counterfactual_data, M, generator
)

Convergence: ✅

 after 41 steps.

Output

. . .

… et voilà!

anim = animate_path(counterfactual; plot_proba=true, colorbar=false, size=(800,300), alpha_=0.7)
gif(anim, fps=5)

Figure 4: Counterfactual path (left) and predicted probability (right) for `GenericGenerator`. The contour (left) shows the predicted probabilities of the classifier (Logistic Regression).

Probabilistic Methods for Counterfactual Explanations

When people say that counterfactuals should look realistic or plausible, they really mean that counterfactuals should be generated by the same Data Generating Process (DGP) as the factuals:

\[ x\prime \sim p(x) \]

But how do we estimate \(p(x)\)? Two probabilistic approaches …

APPROACH 1: use the model itself
APPROACH 2: use some generative model

Schut et al. (2021) note that by maximizing predictive probabilities \(\sigma(M(x\prime))\) for probabilistic models \(M\in\mathcal{\widetilde{M}}\) one implicitly minimizes epistemic and aleotoric uncertainty.

\[ x\prime = \arg \min_{x\prime} \ell(M(x\prime),t) \ \ \ , \ \ \ M\in\mathcal{\widetilde{M}} \qquad(7)\]

Figure 5: A cat performing gradient descent in the feature space à la Schut et al. (2021)

Instead of perturbing samples directly, some have proposed to instead traverse a lower-dimensional latent embedding learned through a generative model (Joshi et al. 2019).

\[ z\prime = \arg \min_{z\prime} \ell(M(dec(z\prime)),t) + \lambda h(x\prime) \qquad(8)\]

and

\[x\prime = dec(z\prime)\]

where \(dec(\cdot)\) is the decoder function.

Figure 6: Counterfactual (yellow) generated through latent space search (right panel) following Joshi et al. (2019). The corresponding counterfactual path in the feature space is shown in the left panel.

Greedy Generator

Code

. . .

This time we use a Bayesian classifier …

using LinearAlgebra
Σ = Symmetric(reshape(randn(9),3,3).*0.01 + UniformScaling(1)) # MAP covariance matrix
μ = hcat(b, w)
M = BayesianLogisticModel(μ, Σ)

. . .

… and once again choose our target label as before …

# Select target class:
y = round(probs(M, x)[1])
target = ifelse(y==1.0,0.0,1.0) # opposite label as target

. . .

… to then finally use greedy search to find a counterfactual.

# Counterfactual search:
generator = GreedyGenerator()
counterfactual = generate_counterfactual(
  x, target, counterfactual_data, M, generator
)

Convergence: ✅

 after 62 steps.

Output

. . .

In this case the Bayesian approach yields a similar outcome.

anim = animate_path(counterfactual; plot_proba=true, colorbar=false, size=(800,300), alpha_=0.7)
gif(anim, fps=15)

Figure 7: Counterfactual path (left) and predicted probability (right) for `GreedyGenerator`. The contour (left) shows the predicted probabilities of the classifier (Bayesian Logistic Regression).

Latent Space Generator

Code

. . .

Using the same classifier as before we can either use the specific REVISEGenerator …

# Counterfactual search:
generator = REVISEGenerator()
counterfactual = generate_counterfactual(
  x, target, counterfactual_data, M, generator
)

. . .

… or realize that that REVISE (Joshi et al. 2019) just boils down to generic search in a latent space:

# Counterfactual search:
generator = GenericGenerator()
counterfactual = generate_counterfactual(
  x, target, counterfactual_data, M, generator,
  latent_space=true
)

Convergence: ✅

 after 8 steps.

Output

. . .

We have essentially combined latent search with a probabilistic classifier (as in Antorán et al. (2020)).

anim = animate_path(counterfactual; plot_proba=true, colorbar=false, size=(800,300), alpha_=0.7)
gif(anim, fps=2)

Figure 8: Counterfactual path (left) and predicted probability (right) for `REVISEGenerator`.

Diverse Counterfactuals

Code

. . .

We can use the DiCEGenerator to produce multiple diverse counterfactuals:

# Counterfactual search:
generator = DiCEGenerator(λ=[0.1, 5.0])
counterfactual = generate_counterfactual(
  x, target, counterfactual_data, M, generator;
  num_counterfactuals = 5
)

Convergence: ✅

 after 28 steps.

Output

. . .

anim = animate_path(counterfactual; plot_proba=true, colorbar=false, size=(800,300), alpha_=0.7)
gif(anim, fps=20)

Figure 9: Counterfactual path (left) and predicted probability (right) for `DiCEGenerator`.

Endogenous Macrodynamics in Algorithmic Recourse 📊

Motivation

TL;DR: We find that standard implementation of various SOTA approaches to AR can induce substantial domain and model shifts. We argue that these dynamics indicate that individual recourse generates hidden external costs and provide mitigation strategies.

In this work we investigate what happens if Algorithmic Recourse is actually implemented by a large number of individuals.

Figure 10 illustrates what we mean by Endogenous Macrodynamics in Algorithmic Recourse:

Panel (a): we have a simple linear classifier trained for binary classification where samples from the negative class (y=0) are marked in blue and samples of the positive class (y=1) are marked in orange
Panel (b): the implementation of AR for a random subset of individuals leads to a noticable domain shift
Panel (c): as the classifier is retrained we observe a corresponding model shift (Upadhyay, Joshi, and Lakkaraju 2021)
Panel (d): as this process is repeated, the decision boundary moves away from the target class.

Figure 10: Proof of concept: repeated implementation of AR leads to domain and model shifts.

We argue that these shifts should be considered as an expected external cost of individual recourse and call for a paradigm shift from individual to collective recourse in these types of situations.

Findings

Mitigation Strategies - Intuition

Choose more conservative decision thresholds.
Classifer Preserving ROAR (ClaPROAR): penalise classifier loss.
Gravitational Counterfactual Explanations: penalise distance to some sensible point in the target domain.

Figure 11: Illustrative example demonstrating the properties of the various mitigation strategies. Samples from the negative class \((y = 0)\) are marked in blue while samples of the positive class \((y = 1)\) are marked in orange.

Mitigation Strategies - Findings

Mitigation strategies applied to synthetic data.

Mitigation strategies applied to real-world data.

Effortless Bayesian Deep Learning through Laplace Redux

LaplaceRedux.jl (formerly BayesLaplace.jl) is a small package that can be used for effortless Bayesian Deep Learning and Logistic Regression trough Laplace Approximation. It is inspired by this Python library and its companion paper.

Plugin Approximation (left) and Laplace Posterior (right) for simple artificial neural network.

Simulation of changing posteriour predictive distribution. Image by author.

`ConformalPrediction.jl`

ConformalPrediction.jl is a package for Uncertainty Quantification (UQ) through Conformal Prediction (CP) in Julia. It is designed to work with supervised models trained in MLJ (Blaom et al. 2020). Conformal Prediction is distribution-free, easy-to-understand, easy-to-use and model-agnostic.

Conformal Prediction in action: Prediction sets for two different samples and changing coverage rates. As coverage grows, so does the size of the prediction sets.

Questions & Answers ❓

More Resources 📚

Read on …

Blog post introducing CE: [TDS], [blog].
Blog post on Laplace Redux: [TDS], [blog].
Blog post on Conformal Prediction: [TDS], [blog].

… or get involved! 🤗

Contributor’s Guide for CounterfactualExplanations.jl
Contributor’s Guide for ConformalPrediction.jl

Image Sources

Crystal ball on beach: Nicole Avagliano on Unsplash
Colour gradient: A.Z on Unsplash
Elephant herd: Sergi Ferrete on Unsplash
DSCC 2022 logo: ING

References

Altmeyer, Patrick. 2022. “CounterfactualExplanations.jl - a Julia Package for Counterfactual Explanations and Algorithmic Recourse.” https://github.com/pat-alt/CounterfactualExplanations.jl.

Antorán, Javier, Umang Bhatt, Tameem Adel, Adrian Weller, and José Miguel Hernández-Lobato. 2020. “Getting a Clue: A Method for Explaining Uncertainty Estimates.” https://arxiv.org/abs/2006.06848.

Blaom, Anthony D., Franz Kiraly, Thibaut Lienart, Yiannis Simillides, Diego Arenas, and Sebastian J. Vollmer. 2020. “MLJ: A Julia Package for Composable Machine Learning.” Journal of Open Source Software 5 (55): 2704. https://doi.org/10.21105/joss.02704.

Goodfellow, Ian J, Jonathon Shlens, and Christian Szegedy. 2014. “Explaining and Harnessing Adversarial Examples.” https://arxiv.org/abs/1412.6572.

Joshi, Shalmali, Oluwasanmi Koyejo, Warut Vijitbenjaronk, Been Kim, and Joydeep Ghosh. 2019. “Towards Realistic Individual Recourse and Actionable Explanations in Black-Box Decision Making Systems.” https://arxiv.org/abs/1907.09615.

Karimi, Amir-Hossein, Bernhard Schölkopf, and Isabel Valera. 2021. “Algorithmic Recourse: From Counterfactual Explanations to Interventions.” In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 353–62.

Karimi, Amir-Hossein, Julius Von Kügelgen, Bernhard Schölkopf, and Isabel Valera. 2020. “Algorithmic Recourse Under Imperfect Causal Knowledge: A Probabilistic Approach.” https://arxiv.org/abs/2006.06831.

Mothilal, Ramaravind K, Amit Sharma, and Chenhao Tan. 2020. “Explaining Machine Learning Classifiers Through Diverse Counterfactual Explanations.” In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 607–17. https://doi.org/10.1145/3351095.3372850.

Murphy, Kevin P. 2022. Probabilistic Machine Learning: An Introduction. MIT Press.

O’Neil, Cathy. 2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown.

Pawelczyk, Martin, Sascha Bielawski, Johannes van den Heuvel, Tobias Richter, and Gjergji Kasneci. 2021. “CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms.” https://arxiv.org/abs/2108.00783.

Poyiadzi, Rafael, Kacper Sokol, Raul Santos-Rodriguez, Tijl De Bie, and Peter Flach. 2020. “FACE: Feasible and Actionable Counterfactual Explanations.” In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 344–50.

Rudin, Cynthia. 2019. “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.” Nature Machine Intelligence 1 (5): 206–15. https://doi.org/10.1038/s42256-019-0048-x.

Schut, Lisa, Oscar Key, Rory Mc Grath, Luca Costabello, Bogdan Sacaleanu, Yarin Gal, et al. 2021. “Generating Interpretable Counterfactual Explanations By Implicit Minimisation of Epistemic and Aleatoric Uncertainties.” In International Conference on Artificial Intelligence and Statistics, 1756–64. PMLR.

Upadhyay, Sohini, Shalmali Joshi, and Himabindu Lakkaraju. 2021. “Towards Robust and Reliable Algorithmic Recourse.” Advances in Neural Information Processing Systems 34: 16926–37.

Ustun, Berk, Alexander Spangher, and Yang Liu. 2019. “Actionable Recourse in Linear Classification.” In Proceedings of the Conference on Fairness, Accountability, and Transparency, 10–19. https://doi.org/10.1145/3287560.3287566.

Wachter, Sandra, Brent Mittelstadt, and Chris Russell. 2017. “Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR.” Harv. JL & Tech. 31: 841. https://doi.org/10.2139/ssrn.3063289.

Wilson, Andrew Gordon. 2020. “The Case for Bayesian Deep Learning.” https://arxiv.org/abs/2001.10995.

Explaining Black-Box Models through Counterfactuals

House Rules

Quick Intro

Trustworthy AI 🔮

The Problem with Today’s AI

Towards Trustworthy AI

Towards Trustworthy AI

Current Standard in ML

Objective

Towards Trustworthy AI

Objective

Towards Trustworthy AI

Objective

Today’s talk

Explaining Black-Box Models through Counterfactuals 🔮

A Framework for Counterfactual Explanations

Framework

Intuition

Counterfactuals … as in Adversarial Examples?

Desiderata

Counterfactuals … as in Causal Inference?

From Counterfactual Explanations to Algorithmic Recourse

Algorithmic Recourse

CounterfactualExplanations.jl in Julia 🛠️

Limited Software Availability

CounterfactualExplanations.jl 📦

A simple example

Generic Generator

Code

Output

Probabilistic Methods for Counterfactual Explanations

Greedy Generator

Code

Output

Latent Space Generator

Code

Output

Diverse Counterfactuals

Code

Output

Endogenous Macrodynamics in Algorithmic Recourse 📊

Motivation

Findings

Mitigation Strategies - Intuition

Mitigation Strategies - Findings

The Road Ahead – Related Research Topics 🎯

Effortless Bayesian Deep Learning through Laplace Redux

ConformalPrediction.jl

Questions & Answers ❓

More Resources 📚

Image Sources

References

`CounterfactualExplanations.jl` in Julia 🛠️

`CounterfactualExplanations.jl` 📦

`ConformalPrediction.jl`