Suppose we have a sample of cats and dogs with information about two features: height and tail length. Based on these two features we have trained two black box classifiers to distinguish between cats and dogs: firstly, an artificial neural network with weight regularization and secondly, that same neural network but its Bayesian counterpart (Figure 1 below). One individual cat – let’s call her Kitty 🐱 – is friends with a lot of cool dogs and wants to be part of that group. Let’s see how we can generate counterfactual paths for her.
Counterfactual search happens in the feature space: we are interested in understanding how we need to change 🐱’s attributes in order to change the output of the black-box classifier. We will start with the first model, that relies on simple plugin estimates to produce its predictions. The model was built and pre-trained using Flux and can be loaded as follows:
using CounterfactualExplanations using CounterfactualExplanations.Data: cats_dogs_model model = cats_dogs_model()
The training data can be loaded and pre-processed as follows:
using CounterfactualExplanations.Data: cats_dogs_data X, ys = cats_dogs_data() counterfactual_data = CounterfactualData(X,ys')
In order to make the Flux.jl model compatible with CounterfactualExplanations.jl we need to run the following (more on this in the models tutorial):
using CounterfactualExplanations.Models import CounterfactualExplanations.Models: logits, probs # import functions in order to extend # Step 1) struct NeuralNetwork <: Models.AbstractFittedModel model::Any end # Step 2) logits(M::NeuralNetwork, X::AbstractArray) = M.model(X) probs(M::NeuralNetwork, X::AbstractArray)= σ.(logits(M, X)) M = NeuralNetwork(model);
x be the 2D-feature vector describing Kitty 🐱. Based on those features she is currently labelled as
y = 0.0. We have set the target label to
1.0 and the desired confidence in the prediction to
γ = 0.75. Now we can use the
GenericGenerator for our counterfactual search as follows:
# Define generator: generator = GenericGenerator() # Generate recourse: counterfactual = generate_counterfactual(x, target, counterfactual_data, M, generator)
GenericGenerator implements the search algorithm first proposed by Wachter, Mittelstadt, and Russell (2017). The resulting counterfactual path is shown in Figure 2 below. We can see that 🐱 travels through the feature space until she reaches a destination where the black-box model predicts that with a probability of more than 75% she is actually a dog. Her counterfactual self is in the target class so the algorithmic recourse objective is satisfied. We have also gained an intuitive understanding of how the black-model arrives at its decisions: increasing height and decreasing tail length both raise the predicted probability that 🐱 is actually a dog.
The generic search above yielded a counterfactual sample that is still quite distinct from all other individuals in the target class. While we successfully fooled the black-box model, a human might look at 🐱’s counterfactual self and get a little suspicious. One of the requirements for algorithmic recourse is that counterfactuals are realistic and unambigous. A straight-forward way to meet this requirement is to generate counterfactuals by implicitly minimizing predictive uncertainty (Schut et al. 2021). The simple neural network does not incorporate uncertainty, but its Bayesian counterpart does: note how in Figure 1 above the contours for the Bayesian neural network (Laplace) fan out away from the sample. As before we will be using a pre-trained model. Laplace approximation was implemented using BayesLaplace.jl (see here for an introduction).
BayesLaplace.jl is not yet registered and therefore not currently a dependency of this library. The example below will therefore only run if you install the BayesLaplace package from Github and then uncomment the first line below.
The pre-trained Bayesian model can be loaded as follows:
# using BayesLaplace, LinearAlgebra # uncomment this line to run code involving Laplace. using CounterfactualExplanations.Data: cats_dogs_laplace la = cats_dogs_laplace()
As before we need to make the model compatible with CounterfactualExplanations.jl:
# Step 1) struct LaplaceNeuralNetwork <: Models.AbstractFittedModel la::BayesLaplace.LaplaceRedux end # Step 2) logits(M::LaplaceNeuralNetwork, X::AbstractArray) = M.la.model(X) probs(M::LaplaceNeuralNetwork, X::AbstractArray)= BayesLaplace.predict(M.la, X) Mᴸ = LaplaceNeuralNetwork(la);
Using the same target and desired confidence
γ as above we finally use the
GreedyGenerator generator for our counterfactual search:
generator = GreedyGenerator(Dict(:δ=>0.1,:n=>15)) counterfactual = generate_counterfactual(x, target, counterfactual_data, M, generator)
GreedyGenerator implements the approach proposed in Schut et al. (2021): by maximizing the predicted probability of the Bayesian model in Figure 3 below, we implicitly minimize the predictive uncertainty around the counterfactual. This way we end up generating a counterfactual that looks more like the individuals 🐶 in the target class and is therefore more realistic.
Schut, Lisa, Oscar Key, Rory Mc Grath, Luca Costabello, Bogdan Sacaleanu, Yarin Gal, et al. 2021. “Generating Interpretable Counterfactual Explanations by Implicit Minimisation of Epistemic and Aleatoric Uncertainties.” In International Conference on Artificial Intelligence and Statistics, 1756–64. PMLR.
Wachter, Sandra, Brent Mittelstadt, and Chris Russell. 2017. “Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR.” Harv. JL & Tech. 31: 841.